This patch was made over D11455, and benckmark compared to it.
It follows an idea similar to that seen in BLI_task_parallel_range
whose TaskParallelSettings has a callback to reduce the
"userdata_chunk" (TaskParallelReduceFunc func_reduce).
I didn't like the idea of adding a mutex to ExtractorRunData but
apparently it doesn't affect performance negatively (tested by keeping
extract_lines in single thread).
Benchmarking
| master: | PATCH: | |
|---|---|---|
| large_mesh_editing: | Average: 14.246502 FPS | Average: 15.438118 FPS |
| rdata 9ms iter 31ms (frame 69ms) | rdata 9ms iter 27ms (frame 65ms) | |
| large_mesh_editing_ledge: | Average: 14.913622 FPS | Average: 15.856538 FPS |
| rdata 9ms iter 30ms (frame 67ms) | rdata 9ms iter 26ms (frame 63ms) | |
| looptris_test: | Average: 3.970774 FPS | Average: 4.095200 FPS |
| rdata 11ms iter 90ms (frame 235ms) | rdata 12ms iter 87ms (frame 229ms) | |
| subdiv_mesh_cage_and_final: | Average: 1.926931 FPS | Average: 1.957404 FPS |
| rdata 7ms iter 39ms (frame 262ms) | rdata 7ms iter 35ms (frame 258ms) | |
| rdata 7ms iter 41ms (frame 254ms) | rdata 7ms iter 37ms (frame 250ms) | |
| subdiv_mesh_final_only: | Average: 6.575331 FPS | Average: 6.679989 FPS |
| rdata 3ms iter 19ms (frame 145ms) | rdata 3ms iter 18ms (frame 146ms) | |
| subdiv_mesh_final_only_ledge: | Average: 6.791831 FPS | Average: 6.723643 FPS |
| rdata 3ms iter 19ms (frame 142ms) | rdata 3ms iter 19ms (frame 142ms) | |
Note:
extract_tris is now the most time-consuming extract in mesh editing:
