Page MenuHome

Draw: extract tris in parallel ranges
ClosedPublic

Authored by Germano Cavalcante (mano-wii) on May 31 2021, 10:16 PM.

Details

Summary

The ibo.tris extraction in multithread is currently only done if the mesh has only 1 material.

This patch proposes to cache a map indicating the index of each tri after sort and thus allow the extraction of tris with materials to be multithreaded.

As caching is a heavy operation, no improvements are expected if the geometry is not deform only. (Since in this case the cache is also cleared).

Profiling:
The test was done by transforming a geometry with deform only (large_mesh_editing_materials) and a geometry with subdivision modifier that clears the cache during transformation (subdiv_mesh_final_only_materials).

master:PATCH:
large_mesh_editing_materials:Average: 23.659796 FPSAverage: 25.971083 FPS
rdata 0ms iter 24ms (frame 42ms)rdata 0ms iter 20ms (frame 39ms)
subdiv_mesh_final_only_materials:Average: 28.832694 FPSAverage: 28.775633 FPS
rdata 0ms iter 1ms (frame 35ms)rdata 0ms iter 1ms (frame 35ms)

1.12x overall speedup

Diff Detail

Repository
rB Blender
Branch
extract_tris_multithread (branched from master)
Build Status
Buildable 14865
Build 14865: arc lint + arc unit

Event Timeline

Germano Cavalcante (mano-wii) requested review of this revision.May 31 2021, 10:16 PM
Germano Cavalcante (mano-wii) created this revision.

There are multiple solutions to optimize extractions of tris.

  • D11290: T88352: Use threaded ibo.tris extraction for single material meshes. adds an optimized version for single material meshes.
  • Do bucket filtering when looping and joining buckets at the end of a loop and/or inside the finish function. When using memcpy we utilize AVX instructions. to construct the main buffer. this would require a loop start and loop end callback to init the buffer for each thread and sync with the master bucket.
  • Only sort when subbuffers are requested. Needs some reinitialization code in DRW_mesh_batch_cache_create_requested
  • Make the subbuffers normal buffers. More data transfer needed.

This patch feels to be an evolution of the current solution.

I would strive to make the init function constant, enable threading. Looking at the above solutions the bucket filtering only matches this, but adds quite some complexity.
A different patch we should make should ensure that tris are only recreated when needed (cache invalidatation).

Germano Cavalcante (mano-wii) planned changes to this revision.Jun 2 2021, 2:00 PM
  • Cache the index of the sorted tris
Germano Cavalcante (mano-wii) retitled this revision from [WIP]Draw Manager: Extract tris in parallel ranges to Draw: extract tris in parallel ranges.Jul 14 2021, 4:21 PM
Germano Cavalcante (mano-wii) edited the summary of this revision. (Show Details)

LGTM.

Eventually it would be possible to do the indexing in a threaded non blocking way. TLS contains an array per mat during reduction these arrays are appended to the master array and its indexes are updated to the master array. The reduction could be vectorized but would add more complexity to the code.

So I am fine with the current solution.

  • Store the sorted indices of polygons instead of triangles

This cuts memory usage in half.
It also uses the inter range of polys thus combining more extracts in the same loop.

master:PATCH:
large_mesh_editing_materials:Average: 13.855380 FPSAverage: 15.525684 FPS
rdata 9ms iter 36ms (frame 71ms)rdata 9ms iter 29ms (frame 64ms)
subdiv_mesh_final_only_materials:Average: 28.113742 FPSAverage: 28.633599 FPS
rdata 0ms iter 1ms (frame 36ms)rdata 0ms iter 1ms (frame 35ms)
This revision was not accepted when it landed; it landed in state Needs Review.Jul 21 2021, 8:15 PM
This revision was automatically updated to reflect the committed changes.