The goal is to reduce threading overhead when evaluating geometry nodes.
Most existing .blend files don't benefit from this because their execution time is bottlenecked by individual nodes. However, in some cases the number of nodes can result in a bottleneck itself. That's mainly the case when there are lots of "small" nodes (which are fast to execute). In this case, using lots of multi-threading is counter productive. The goal of "lazy threading" is to only use multi-threading when it is likely to be useful. For more details see the description in BLI_lazy_threading.hh.
I mainly tested the performance improvements in three files (one was actually used in practice, and two I made myself which are not too relevant but test the bottlenecks of the system). Performance in files with less extreme node counts, performance is expected to be about the same. If a file does become slower, it can usually be fixed by adding a blender::lazy_threading::send_hint() call in some place.
While the modifier execution time reduced in all three cases, it's actually more interesting to look at the result of the linux time command. That's because it not only shows wall clock time, but also the cpu time spent (in user and kernel mode).
time ./blender -b "/run/media/jacques/Jacques Data/Projekte/geometry_nodes_demo_files/many_math_nodes.blend"
before:
real 0m1.120s
user 0m7.948s
sys 0m1.765s
after:
real 0m0.726s
user 0m2.775s
sys 0m0.345s
time ./blender -b "/home/jacques/Downloads/Geo node tree test 1-1-1 (no viewport).blend"
before:
real 0m0.744s
user 0m2.173s
sys 0m1.153s
after:
real 0m0.669s
user 0m1.466s
sys 0m0.541s
time ./blender -b "/run/media/jacques/Jacques Data/Projekte/geometry_nodes_demo_files/large_nested_field_test.blend"
before:
real 0m0.517s
user 0m1.869s
sys 0m0.586s
after:
real 0m0.449s
user 0m0.411s
sys 0m0.120sAs can be seen, the real time is often only reduced by a bit, but the user and sys time is reduced significantly. This shows that threads are used better.