Page MenuHome

BMesh: Multithread BMesh to evaluated Mesh conversion
ClosedPublic

Authored by Hans Goudey (HooglyBoogly) on Oct 14 2022, 12:00 AM.
Tags
None
Subscribers
Tokens
"Love" token, awarded by silex."Love" token, awarded by JanErik."Love" token, awarded by Dangry.

Details

Summary

Currently this conversion (which happens when using modifiers in edit
mode, for example) is completely single threaded. It's harder than
some other areas to multithread because BMesh elements don't always
know their indices (and vise versa), and because the dynamic AoS format
used by BMesh makes some typical solutions not helpful.

This patch proposes to split the operation into two steps. The first
updates the indices of BMesh elements and builds tables for easy
iteration later. It also checks if some optional mesh attributes
should be added. The second uses parallel loops over all elements,
copying attribute values and building the Mesh topology.

Both steps process different domains in separate threads (though the
first has to combine faces and loops). Though this isn't proper data
parallelism, it's quite helpful because each domain doesn't affect the
others.

Timings
I tested this on a Ryzen 3700x with a 1 million face grid. With no
extra attributes and then with several color attributes and vertex
groups.

FileBeforeAfter
Simple190.2 ms124.0 ms
More Attributes253.9 ms134.8 ms

On a Ryzen 7950x:

FileBeforeAfter
Simple101.6 ms59.6 ms
More Attributes149.2 ms65.6 ms

The optimization scales better with more attributes on the BMesh. The
speedup isn't as linear as multithreading other operations, indicating
added overhead. I think this is worth it though, because the user is
usually actively interacting with a mesh in edit mode.


Potential Future Improvements

  • Avoid the first lookup table building when they are already non-dirty
  • Store the tables in the BMesh to ammortize their creation (discussed in D14938)
  • Do a small bit of preprocessing to make CustomData_from_bmesh_block faster (D17141)
  • Avoid building tables if there aren't any attributes to transfer
    • To avoid code duplication here I went with the option that scaled better
  • Experiment with different grain sizes
  • Apply similar changes to the non "for eval" version of BMesh to Mesh conversion

Extra Timing Information

SIMPLE:
    'CustomData Merge': (Average: 5178 ns, Min: 3350 ns)
    'Vert 1st Pass': (Average: 26.6 ms, Min: 21.2 ms)
    'Face/Loop 1st Pass': (Average: 63.1 ms, Min: 58.4 ms)
    'Edge 1st Pass': (Average: 63.0 ms, Min: 56.9 ms)
    'Adding Optional Mesh Attributes': (Average: 31965 ns, Min: 23810 ns)
    'Face 2nd Pass': (Average: 41.4 ms, Min: 15.0 ms)
    'Vert 2nd Pass': (Average: 42.9 ms, Min: 21.3 ms)
    'Edge 2nd Pass': (Average: 56.5 ms, Min: 36.5 ms)
    'Loop 2nd Pass': (Average: 58.4 ms, Min: 44.1 ms)
    'BM_mesh_bm_to_me_for_eval': (Average: 131.1 ms, Min: 123.6 ms)
MORE ATTRIBUTES:
    'CustomData Merge': (Average: 4.3 ms, Min: 4.1 ms)
    'Vert 1st Pass': (Average: 26.4 ms, Min: 21.3 ms)
    'Edge 1st Pass': (Average: 62.6 ms, Min: 58.4 ms)
    'Face/Loop 1st Pass': (Average: 64.1 ms, Min: 58.9 ms)
    'Adding Optional Mesh Attributes': (Average: 32289 ns, Min: 26410 ns)
    'Vert 2nd Pass': (Average: 62.7 ms, Min: 34.0 ms)
    'Loop 2nd Pass': (Average: 68.1 ms, Min: 48.0 ms)
    'Face 2nd Pass': (Average: 46.8 ms, Min: 17.0 ms)
    'Edge 2nd Pass': (Average: 67.7 ms, Min: 42.5 ms)
    'BM_mesh_bm_to_me_for_eval': (Average: 145.2 ms, Min: 138.1 ms)

Diff Detail

Repository
rB Blender

Event Timeline

Hans Goudey (HooglyBoogly) requested review of this revision.Oct 14 2022, 12:00 AM
Hans Goudey (HooglyBoogly) created this revision.
Hans Goudey (HooglyBoogly) edited the summary of this revision. (Show Details)
Sergey Sharybin (sergey) requested changes to this revision.Oct 14 2022, 11:36 AM

There are some SCOPED_TIMER_AVERAGED scattered in the patch. They should not end up in the final commit ;)

Surely having a sub-optimal threading which is faster is better than not having speedup at all. The biggest concerning aspect to me is the const-ness of the input. Ideally conversion would use input in the read-only manner. The description makes it sound that it is not a read-only in the new code. Is there a change in the const-ness of the input with the proposed change?

On a code side, I can't say I am a huge fan of the threading::parallel_invoke call with several non-trivial lambdas. When such a call is longer than a screen of code it is rather unreadable. Can this be split into a more semantically sounding functions?
Also, can this be somehow improved in clarity of what's going on. Current indication of the next code block is doing is the debug timing print which mentions "Vert 2nd pass". What does it mean? There is surely more descriptive comment possible to write around those stages of conversion.

This revision now requires changes to proceed.Oct 14 2022, 11:36 AM

Thanks for the quick comment!

There are some SCOPED_TIMER_AVERAGED scattered in the patch. They should not end up in the final commit ;)

Yeah, just kept that interested in case reviewers were interested. Removed now.

Is there a change in the const-ness of the input with the proposed change?

I'd say this is a very slight improvement in const correctness, since we use const BMesh element pointers where possible.
However, it still uses BM_elem_index_set for a couple reasons:

  • I wanted to avoid the additional memory usage and lookup cost of using VectorSet to retrieve the indices (i.e. the memory is there, we might as well use it).
  • Setting this to a correct value isn't too bad from a thread safety perspective, since any other thread should do the same thing first anyway.
  • I think I remember you mentioning that removing that was causing crashes in other code that mistakenly depended on it.

On a code side, I can't say I am a huge fan of the threading::parallel_invoke call with several non-trivial lambdas

This avoids a bunch of boilerplate for passing parameters between functions.I split them all to separate functions now. Though I still like the lambdas more, I see your point.

  • Improve/add comments
  • Move lambda bodies to separate functions
  • Slightly change naming
  • Remove timers

Setting this to a correct value isn't too bad from a thread safety perspective, since any other thread should do the same thing first anyway.

I am not sure if such assignment on arm64 is considered safe. x86 should be fine though.
In any case, it is not different from other places and the previous state of code. So lets not worry about this specific thing here :)

I think I remember you mentioning that removing that was causing crashes in other code that mistakenly depended on it.

Things turned out to be much more stupid: I didn't see those indices are actually used few lines below. Duuuh. So, in theory, it is not impossible to get rid of those indices assignment. But indeed that does not come "for free".

This avoids a bunch of boilerplate for passing parameters between functions.I split them all to separate functions now. Though I still like the lambdas more, I see your point.

Think it is much more clear now! At least i have better idea about whats going on in every stage.
For the parameters: we could use some sort of "context" structure instead of passing individual arguments. Not sure how much it helps here though.

I didn't have time yet top actually test the patch, but explanation and quick read of the code seems all good. Although, that's not really area I maintain, so still will be nice to have Campbell's take on it.

silex (silex) added a subscriber: silex (silex).

Generally this seems fine, minor issues inline.

source/blender/bmesh/intern/bmesh_mesh_convert.cc
1313

Prefer to keep naming order matching most other BMesh functions, e.g. bm_vert_table_build, could also call bm_vert_table_build_and_calc_needs

1341–1343

While unlikely to be a big difference, this logic could OR the hflag then assign need_* variables at the end of the loop.
Same for other loops over mesh data..

This revision is now accepted and ready to land.Fri, Feb 3, 10:58 AM
Hans Goudey (HooglyBoogly) marked 2 inline comments as done.Fri, Feb 3, 4:09 PM