Page MenuHome

Draw Manager: Mesh Extract: Balance the execution time of threads
AbandonedPublic

Authored by Germano Cavalcante (mano-wii) on Jun 1 2021, 2:21 PM.

Details

Summary

For most cases this patch makes no difference as meshes usually only have multithreaded polygon extractions.

But in cases like when the mesh has loose geometry this can make a difference.

Currently each thread executes only one type of iteration (poly, ledge, lverts...), so if the amount between
these elements is different, the thread's execution time is unbalanced.

For example, 1 thread can work with 1000 polygons while another thread works with only 1 loose vert.

The solution for this patch is to perform the multiple types of iteration in a single thread.

Tests:
Profiling a high poly mesh with loose edge shows no difference.

This must depend more on the amount of threads in the hardware.

But since there was no regression, I believe this is the way to go.

Diff Detail

Repository
rB Blender
Branch
master
Build Status
Buildable 14872
Build 14872: arc lint + arc unit

Event Timeline

Germano Cavalcante (mano-wii) requested review of this revision.Jun 1 2021, 2:21 PM
Germano Cavalcante (mano-wii) created this revision.

I see what you're trying to do, but I'm not convinced that this has any benefits (User or CPU wise). We are talking nano-seconds here.
At some point having as many tasks as threads can work against you. Adding additional tasks (like the before situation) might be more flexible during balancing.

At some point having as many tasks as threads can work against you. Adding additional tasks (like the before situation) might be more flexible during balancing.

I'm not sure I understand.
The number of threads is the same.
extract_range_num_threads_estimate uses CHUNK_SIZE to define how many threads should be used (maybe the name can be improved).

What this patch does is distribute the elements more evenly.
To illustrate, the distribution of geometries can currently be described as follows:

thread 1: [60 polygons]
thread 2: [60 polygons]
thread 3: [3 edges]

With this patch these same elements are distributed like this:

thread 1: [40 polygons + 1 edge]
thread 2: [40 polygons + 1 edge]
thread 3: [40 polygons + 1 edge]

or

thread 1: [60 polygons + 2 edges]
thread 2: [60 polygons + 1 edge]

Due to recent changes is this still something we would want to integrate?
IMO the threading is good enough. Not sure if this fits in the parallel range implementation we have now.

It is no longer necessary.