Experiment to increase the maintainability of D11375 by using sequential arrays of extractors.
Added as a separate patch.
Details
Diff Detail
- Repository
- rB Blender
Event Timeline
The code looks better :). And the benchmark on Meshes (not BMesh), showed an improvement in frame time. (Is this supposed to be be affected?)
Master: [rdata 7ms iter 44ms (frame 294ms)] [rdata 7ms iter 46ms (frame 285ms)] Patch: [rdata 7ms iter 44ms (frame 284ms)] [rdata 7ms iter 47ms (frame 275ms)]
For BMeshs the improvement is maintained according to the other patch:
Master: rdata 9ms iter 53ms (frame 172ms) Patch: rdata 9ms iter 35ms (frame 150ms)
| source/blender/draw/intern/draw_cache_extract_mesh.c | ||
|---|---|---|
| 820 | Here it looks like you can use the macro POINTER_OFFSET | |
| 834 | Although it does not avoid the '4' bytes padding after member len, wouldn't that member be better at the end of the struct? | |
| 6083 | Update or remove this incompatible disabled code. | |
| source/blender/draw/intern/draw_cache_extract_mesh.c | ||
|---|---|---|
| 834 | Not in this case. Adding the len at the start would enable prefetching. Will add a comment for that. | |
| source/blender/draw/intern/draw_cache_extract_mesh.c | ||
|---|---|---|
| 834 | Second thoughts that it doesn’t matter that much. It is just the way how I think about cpu optimized code. In cpp we would replace this with an blender array and would be hidden. | |
| source/blender/draw/intern/draw_cache_extract_mesh.c | ||
|---|---|---|
| 834 | On second thought, keeping the MeshExtractRunData at the end can be good for structure with variable size. | |
I think the last step is just to do some testing and then we could land it in master (hopefully tomorrow/today)
| source/blender/draw/intern/draw_cache_extract_mesh.c | ||
|---|---|---|
| 6083 | I removed it for now. | |
This patch breaks the DEBUG_TIME directive, so I had to make some edits to test it.
Using the attached files in this comment, I got these results:
MASTER: large_mesh_editing: - rdata 9ms iter 50ms (frame 155ms) - Average: 6.462874 FPS subdiv_mesh_cage_and_final: - rdata 7ms iter 46ms (frame 272ms) - rdata 7ms iter 48ms (frame 264ms) - Average: 1.865282 FPS subdiv_mesh_final_only: - rdata 3ms iter 25ms (frame 158ms) - Average: 6.080243 FPS
PATCH: large_mesh_editing: - rdata 9ms iter 34ms (frame 136ms) - Average: 7.379491 FPS subdiv_mesh_cage_and_final: - rdata 7ms iter 46ms (frame 272ms) - rdata 7ms iter 48ms (frame 263ms) - Average: 1.856767 FPS subdiv_mesh_final_only: - rdata 3ms iter 24ms (frame 155ms) - Average: 6.277448 FPS
`PATCH`/`MASTER` factor: FPS - Average (higher is better): large_mesh_editing: `1.14` subdiv_mesh_cage_and_final: `0.99` subdiv_mesh_final_only: `1.03` ms - iter (lower is better): large_mesh_editing: `0.68` subdiv_mesh_cage_and_final: `1.00` subdiv_mesh_final_only: `0.96`
For extraction of the BMesh type, we can see a relative boost.
Since what prevails in the result are the single thread operations, we can't see very expressive differences in all tests.
For the result with BMesh, I believe we can count on this patch.
The readability of the code worries me a little, but it is not that good in master too.