Page MenuHome

Parallelizing BLI_memory_utils.hh
AbandonedPublic

Authored by Erik Abrahamsson (erik85) on Jun 13 2021, 1:15 PM.

Details

Summary

This patch makes the functions destruct_n, initialized_copy_n, initialized_move_n and initialized_fill_n run in parallel if the number of items are 64 or greater. This number I just chose to not run in parallel when there are few item in the container. Maybe this should be much higher, like 1024, or lower like 2.. I honestly have no idea. But 64 feels a bit low.

I would have liked to include default_construct_n in this patch too as it's the one that would make the biggest impact in the boolean modifier, but I couldn't find a way to make it work with try-catch and clean up properly if there was an exception.

Diff Detail

Repository
rB Blender

Event Timeline

Erik Abrahamsson (erik85) requested review of this revision.Jun 13 2021, 1:15 PM
Erik Abrahamsson (erik85) updated this revision to Diff 38249.
Erik Abrahamsson (erik85) created this revision.
Jacques Lucke (JacquesLucke) requested changes to this revision.EditedJun 13 2021, 2:08 PM

Can you provide some numbers for how much this improves performance?
Also it would be interesting to know why destruction is so slow, what is being destructed?

Generally, I'd be very careful with these changes, they can easily lead to very hard to find bugs.

  • Types whose copy/move assignment operators or destructor are not thread-safe could not use many of the fundamental data structures like Vector anymore. I'm not saying that you should write data structures that are not thread safe in that regard, but it should still be possible.
  • Using tbb threading primitives like parallel_for can result in arbitrary other tasks to be executed on the current thread. I just came out of this task isolation hell (T88598). If most operations on core data structures could suddenly start using threads, we'd surely run into task isolation related issues even more. This could maybe be solved by using task isolation in functions like destruct_n.
  • Having BLI_task.hh as a dependency of BLI_memory_utils.hh might be a bit bad, since it is included in many files and might lead to cyclic dependencies. Not sure how that should be structured ideally.
  • I think the user of a data structure should opt-into using threading to avoid the issues above. This could be done with a flag or some specialized methods on the data structure. Maybe there are other solutions.

Can you use arc for posting patches, this would give the patch more context (where it currently says Context not available. in the diff.

This revision now requires changes to proceed.Jun 13 2021, 2:08 PM

Ok, many problems with this it seems so it might not be worth investigating any further.
It did give around a (edit: 8%) speed increase on my boolean test blend (one subdivided cube cutting another) but used more effective CPU due to overhead I noticed now.
After I posted this I found the option to use NoInitialization with Array and construct the heavy objects myself in parallel so that's probably the way to go.