Page MenuHome

Tests: detect memory leaks in automated tests
ClosedPublic

Authored by Jacques Lucke (JacquesLucke) on Aug 21 2020, 11:49 AM.

Details

Summary

A memory leak should be considered a bug. Therefore, it makes sense to fail tests when they contain memory leaks. On my machine, all tests pass with this change (I fixed one remaining issue already).

This patch adds a new --fail-on-memleak flag. When it is used, Blender will have a non-zero exit code when guardedalloc detects a memory leak.
I enabled this flag in various places that run tests. I also added --debug-memory, because it adds more information to the test output.

If a test fails, one should rerun the test with ctest -R testname --verbose. The output will contain the names of the memory blocks that haven't been freed.

Diff Detail

Repository
rB Blender

Event Timeline

Jacques Lucke (JacquesLucke) requested review of this revision.Aug 21 2020, 11:49 AM
Jacques Lucke (JacquesLucke) created this revision.

Think it's fine.

The only thing I'm not sure is the --debug-memory. It makes it so the testing happens with different code compared to what is used in the real production.
In practice I think it's OK, as the performance difference is likely unmeasurable. And we can always extend allocator tests to ensure it works correctly.

This revision is now accepted and ready to land.Aug 24 2020, 9:24 AM
Brecht Van Lommel (brecht) requested changes to this revision.Aug 24 2020, 12:35 PM

We want Blender to exit with an error code for various reasons, a memory leak, a Python error (for which there exists --python-exit-code), but also any other errors or even warnings. This is useful for automated testing, but also e.g. render farms that don't want to continue rendering if there are missing textures.

To me it seems we are missing a bigger design for this, adding a flag for every case would be too much. A general flag to exit on errors seems like a logical solution, though there is a distinction between errors that are important to users and developers, they need to be controlled separately. We don't want render farm jobs to abort on memory leaks.

I suggest the following:

  • Add a --debug-exit-on-error flag that exits on memory meaks, but also on any errors reported to clog. I think it makes sense for developer options to have the --debug prefix.
  • In the future, we can add an --exit-on-error for users that can replace --python-exit-code and exit on errors that are interesting to users.
This revision now requires changes to proceed.Aug 24 2020, 12:35 PM

@Brecht Van Lommel (brecht), where does the exit-on-missing-textures belong to in your suggestion? It wouldn't be practical to make render farms to abort on all errors reported via clog (or the severity of certain logs is to be revisited).

@Brecht Van Lommel (brecht), where does the exit-on-missing-textures belong to in your suggestion? It wouldn't be practical to make render farms to abort on all errors reported via clog (or the severity of certain logs is to be revisited).

My suggestion is to only add --debug-exit-on-error in this patch, which will exit on internal/developer errors.

For missing textures, that would be the future --exit-on-error flag. Making such a flag work will indeed require deeper change to our logging and reports (also see my comments in D8691).

Ok, sounds good and easy to tweak the patch to the suggestion then :)

  • Use --debug-exit-on-error.

Not sure if I got this right now. I essentially just renamed the command line flag.
I did not get into interfacing this with clot yet, because it is already quite
useful without that. I can look into that later, maybe when the gsoc project is done.

Please add the clog integration now. It should be quite simple, clog already exits on fatal errors.

source/creator/creator_args.c
756–757

I would make the description something simpler like this: "Immediately exit when internal errors are detected"

These descriptions are read by users, and what "automated tests" means in that context is unclear. It should also be clear that this is about internal errors, not all types of errors.

  • Abort on clog error when flag is passed.

Note: Now the cycles_mesh test is failing (more specifically watertight_orig it seems).
Some alembic tests are failing as well, but those seem unrelated, because they also fail on buildbot.

It seems to be failing on an invalid driver, which should not be an error. Really it's more a user error than internal error, but the simple fix now is to change it to a warning:

diff --git a/source/blender/blenkernel/intern/anim_sys.c b/source/blender/blenkernel/intern/anim_sys.c
index 8fe57f1..69e70cf 100644
--- a/source/blender/blenkernel/intern/anim_sys.c
+++ b/source/blender/blenkernel/intern/anim_sys.c
@@ -2870,7 +2870,7 @@ void BKE_animsys_eval_driver(Depsgraph *depsgraph, ID *id, int driver_index, FCu
 
       /* set error-flag if evaluation failed */
       if (ok == 0) {
-        CLOG_ERROR(&LOG, "invalid driver - %s[%d]", fcu->rna_path, fcu->array_index);
+        CLOG_WARN(&LOG, "invalid driver - %s[%d]", fcu->rna_path, fcu->array_index);
         driver_orig->flag |= DRIVER_FLAG_INVALID;
       }
     }

Also, in this case we should always print the error before Blender exits, even if no logging was enabled with --log. Otherwise it's unclear why Blender exited.

source/creator/creator_args.c
2290

Add this next to the other --debug flags.

  • move debug arg up to the others
  • add --debug-memory to cycles tests
  • print output in case of crashed cycles test

I'll commit the error->warn fix in master separately now.

P1607 is the output I get from running ctest -R cycles_mesh --verbose now.

  • print output in case of crashed cycles test

Actually, I was thinking about the error message not being printed by clog. Which could be solved by raising the log level to at least error when enabling --debug-exit-on-error.

The Cycles test crash printing I would leave unchanged.

Honestly, I don't really understand what you mean. Maybe we talk past each other.

Actually, I was thinking about the error message not being printed by clog.

Why do we want that the error is not printed when Blender crashes? Don't you think it is useful to have the error printed to be able to debug the issue? I'm not very familiar with clog and the cycles tests, so maybe I'm missing something obvious.

I was confused, I though the "invalid driver" error was not being printed by clog by default, but it seems it is printed.

The patch seems fine then.

This revision is now accepted and ready to land.Aug 26 2020, 6:59 PM

Hi @Jacques Lucke (JacquesLucke), could you say which os you tried this on? When building with make developer most (python) unit tests fail for me on mac and I believe they fail on linux too. (For context, this was discovered when testing D8507 with different make configurations)

I haven't run these tests in debug builds tbh (usually they take way to long). Memory leaks are also detected in release builds. My guess is that there is some debug-mode only code that logs errors..
Running ctest --verbose -R mytestname might give more information.

I haven't run these tests in debug builds tbh (usually they take way to long). Memory leaks are also detected in release builds. My guess is that there is some debug-mode only code that logs errors..
Running ctest --verbose -R mytestname might give more information.

Actually make developer gives memory leak error in release too. I reported a bug T80605, but I'm still trying to narrow down the exact cause.