Use multiprocessing with simple deform modifiers.
Details
Diff Detail
- Repository
- rB Blender
Event Timeline
| source/blender/modifiers/intern/MOD_simpledeform.c | ||
|---|---|---|
| 300 | The crash happens here, not sure why. | |
Crash fixed (I think), though this approach is understandably very slow. I wonder if it makes more sense to split the work into chunks and do it that way.
| source/blender/modifiers/intern/MOD_simpledeform.c | ||
|---|---|---|
| 60 | Looking at performance it might be better to make sure that this struct uses as much as possible aligned addressing. Please consider moving mode fiels to the end of this struct and using a char for deform_axis | |
| 67 | use float[3] here so you can use BLI_math_vector functions like copy_v3v3. It just reduces the number of lines that a developer needs to read. | |
| 246 | To my knowledge function pointers can influence the branch prediction. Consider to call the functions directly in the switch statement or use CPP templates to remove the check on mode at all. Most of our code base isn't CPP, but it is adviced to migrate it on a file basis if there is a need. | |
| 290 | It is unclear to me why a lock is needed. | |
| 490 | This would make a new task per vertex. Consider using BLI_task_parallel_range to group multiple vertices in a single task. See meshdeformModifier_do for an example | |
| source/blender/modifiers/intern/MOD_simpledeform.c | ||
|---|---|---|
| 246 | Question, what's the "protocol" you guys use when changing the file over to CPP? I imagine there's a lot of dependencies here, so "just" changing the extension to "cc" or "cpp" isn't enough, right? | |
Using BLI_task_parallel_range(...) now, and the playback is slightly improved, on my 3990X, it was roughly:
- Serial: 2.9 fps on average (fps never went above 3fps)
- Parallel: 3.14 fps on average (fps never went below 3fps)
The scene I used was a dragon scene.
I still need to align the struct so I'll work on that next, and then look at what else I can do to improve the implementation.
IMO this patch only needs some polishing:
- remove the //new description.
- reorder the vars in CurrDeformData.
- remove the #if 0 and unused variables
/home/jeroen/blender-git/blender/source/blender/modifiers/intern/MOD_simpledeform.c:422:15: warning: unused variable ‘axis_map’ [-Wunused-variable]
422 | const uint *axis_map =
| ^~~~~~~~
/home/jeroen/blender-git/blender/source/blender/modifiers/intern/MOD_simpledeform.c:312:10: warning: variable ‘simpleDeform_callback’ set but not used [-Wunused-but-set-variable]
312 | void (*simpleDeform_callback)(const float factor,
| ^~~~~~~~~~~~~~~~~~~~~
/home/jeroen/blender-git/blender/source/blender/modifiers/intern/MOD_simpledeform.c:308:15: warning: unused variable ‘base_limit’ [-Wunused-variable]
308 | const float base_limit[2] = {0.0f, 0.0f};
| ^~~~~~~~~~I don't see any issue with adding this to master.
| source/blender/modifiers/intern/MOD_simpledeform.c | ||
|---|---|---|
| 59 | Unclear what Curr stands for. Perhaps DeformUserData | |
| 61 | Remove //new. | |
| 246 | The protocol is just to do it when there is a benefit. After renaming the extension to .cc just make sure that nullptr and cpp casting is used. Normally it is an easy task to do. Other features can be migrated (if needed outside this patch). We use clang-tidy to check for the common cases that needs to be migrated, but that isn't available on all platforms. | |
| 425 | Remove compile directive. and cleanup unused variables. | |
Patch seems fine. Just a small comment. Do you have updated performance figures between master and this patch?
| source/blender/modifiers/intern/MOD_simpledeform.c | ||
|---|---|---|
| 60 | Should be moved to BLI_compiler_attrs.h | |
Compiler warning:
/home/jeroen/blender-git/blender/source/blender/modifiers/intern/MOD_simpledeform.c:317:10: warning: variable ‘simpleDeform_callback’ set but not used [-Wunused-but-set-variable]
Master 2.92 fps this patch 3.13 fps on Ryzen 1700X With Vega 64 GPU.
I will update the patch comment with the information I have so far and commit it to master (3.0) on friday.
Quick update with some numbers:
3970X: 2.85fps -> 2.95 fps
3990X: 3.15 fps -> 3.41 fps
3995WX: 3.21 fps -> 3.38 fps