Page MenuHome

python enhancement: enable LTO/PGO and disable semantic interposition
AbandonedPublic

Authored by Paul Bransford (draeath) on Jul 8 2020, 3:14 AM.

Details

Summary

This change does the following for python builds (it makes no changes to Blender itself):

  • Non-debug Windows builds: enable profiler-guided optimization
  • Apple/Unix: enable full optimization suite (LTO and PGO) and if building with gcc 5.3 or newer also disable semantic interposition. Non-gcc (or older gcc) compilers will get the LTO/PGO switch, but semantic interposition CFLAG won't be added as it is specific to gcc.

While a day-to-day user likely won't see much change, there is potential for addon performance improvements of up to 30% (though more likely 10-20% based on my own non-blender python workloads using these build tweaks) when not blocked by the C API. I have been using python builds with '--enable-optimizations' in production HPC workloads for years and have not run into any trouble due to it - it should be a safe change in and of itself.

In all cases this should increase (or leave unchanged) python interpreter execution times. In the case where semantic interposition is disabled, the potential performance gain is stronger, although LD_PRELOAD will no longer work with the produced libpython. This should not matter for Blender, which does not use this as far as I can tell. Other syscalls (eg to libc) should be interceptable as normal.

https://developers.redhat.com/blog/2020/06/25/red-hat-enterprise-linux-8-2-brings-faster-python-3-8-run-speeds/ goes into more details on what this is doing under the hood. It's not specific to python 3.8 or RHEL 8.2.

Python version notes: the --enable-optimizations flag became valid in Python 3.5, and the --pgo flag for the Windows python build.bat goes back at least that far.

Final note: I do not have the means to fully test builds /w this change on all supported platforms, but I do believe I've covered all the bases.

Diff Detail

Repository
rB Blender

Event Timeline

Paul Bransford (draeath) requested review of this revision.Jul 8 2020, 3:14 AM
Paul Bransford (draeath) created this revision.

I make use of VERSION_GREATER_EQUAL which is not supported in the low-end of cmake versions the project supports, though it was suggested to me on blender.chat to not worry about that.

If that is a mistake, line 85 of the change can be rewritten to take on a different form:
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 5.3 OR CMAKE_CXX_COMPILER_VERSION VERSION_EQUAL 5.3))

Missed a required change to configure.sh args

do it in build_files/build_environment/install_deps.sh as well since someone might use that instead

Paul Bransford (draeath) updated this revision to Diff 26617.EditedJul 8 2020, 3:39 AM

tweak last change to check cc consistently (which is what python's Makefile uses)

OK, I'm done fiddling and this is ready for review and test.

Is if(BUILD_MODE STREQUAL Debug) valid for non-Win32 builds? If so, it would be good to add a bit more around the PYTHON_CONFIGURE_EXTRA_ARGS so that it's done for release builds only - because this change will increase build time, potentially significantly.

Brecht Van Lommel (brecht) requested changes to this revision.Jul 8 2020, 1:22 PM

Have you tested running a script with ./blender -b -P script.py and verified that there is a speedup? Any numbers?

Is if(BUILD_MODE STREQUAL Debug) valid for non-Win32 builds? If so, it would be good to add a bit more around the PYTHON_CONFIGURE_EXTRA_ARGS so that it's done for release builds only - because this change will increase build time, potentially significantly.

We don't usually do debug builds for non-Windows, but add the check regardless.

build_files/build_environment/cmake/python.cmake
23

PYTHON_EXTRA_BUILD_FLAGS would be a more consistent name for this.

Also, is removing -c ${BUILD_MODE} for release builds really necessary? If it's simply redundant I would keep it.

25

Add descriptive comment here and for the Unix case about why these options are enabled.

build_files/build_environment/install_deps.sh
1329 ↗(On Diff #26617)

Should this be == instead of =?

This revision now requires changes to proceed.Jul 8 2020, 1:22 PM

I'll clean these bits up as commented in-line. Thanks for the feedback!

Re "Have you tested running a script" - I don't have a suitable script to test with to get some clean numbers, unfortunately. I know a few people who work on addons that could likely help me out on that aspect. I'll see about doing so.

EDIT: actually there's plenty of code on https://pybenchmarks.org that'll be great for testing this. I'll pick a few and get some figures.

build_files/build_environment/install_deps.sh
1329 ↗(On Diff #26617)

Either should work. That said, I should verify that == does, and switch over - at least for consistency. I wrote the first part from scratch, the second part was adapted from working code elsewhere.

The [ "$a" = foo ] for is POSIX sh, where [ "$a" == foo ] is bash specific. (bash supports POSIX sh)

about VERSION_GREATER_EQUAL being cmake 3.7+ given there's only a few people that should be running the deps script anyhow, i feel we should just bump the minimum rather than hack around ancient cmake limitations.

Doesn't build on windows currently for me, some profiles are not recorded and dist utils test errors out with a linker error, which causes the build not to even do the final optimized link, will have to take look what is up there also it seemingly adds about an hour to the build time which i'm not super thrilled about, unless there are some serious measurable benefits for blender (and not some synthetic benchmark) to justify this extra time.

OK, so I was able to perform some benchmarking. The end result: this is completely not worth the effort or the increased build time. I apologize for wasting your time looking at this one.


When it helps us, it does so by 1-2% - but on some of the benchmarks performance is actually worsened (by similar difference).

For posterity, my notes about my methodology and the raw user-time values for the various builds/benchmarks is attached. I performed this test on an AWS EC2 c5.large instance.