Page MenuHome

BKE: Mesh attribute adapt domain improve
ClosedPublic

Authored by Iliya Katueshenock (Moder) on Sep 24 2022, 9:39 PM.

Details

Summary

Minor improvement in attribute domain adapt functionality.

  1. A generic array is used where possible. Less template code.
  2. Wherever a boolean type is used, multi-threaded writing can be used.
  3. For singles, can use a simplified version.

Boolean multithreading advantage
~3x...10x

Timer 'Use Parallel' took 0.4, 0.4, 2.9  ms
Timer 'No parallel'  took 1.2, 2.3, 20.4 ms

Timer 'grain size = 1024' took 2.1, 0.6, 2.0, 2.6, 0.9, 2.6 ms
Timer 'grain size = 2048' took 1.8, 0.5, 1.8, 2.5, 0.9, 2.4 ms
Timer 'grain size = 4096' took 1.8, 0.5, 1.8, 3.4, 0.9, 2.5 ms

According to the test results, can say that 2048 will be the best. It starts to show up at 3.6~million point meshes.

Diff Detail

Event Timeline

Iliya Katueshenock (Moder) requested review of this revision.Sep 24 2022, 9:39 PM
Iliya Katueshenock (Moder) created this revision.
Iliya Katueshenock (Moder) edited the summary of this revision. (Show Details)
Iliya Katueshenock (Moder) edited the summary of this revision. (Show Details)

This looks like a nice improvement. Two comments:

  • 1024 seems like a small grain size for editing a boolean array. I'd probably go with 4096 so we don't use threading unless it will really be worth it.
  • adapt_single_to could be simpler (probably doesn't need to be a separate function) if it used attributes.domain_size(to_domain).

Timer 'grain size = 1024' took 2.1, 0.6 ms, 2.0, 2.6, 0.9, 2.6 ms
Timer 'grain size = 2048' took 1.8, 0.5 ms, 1.8, 2.5, 0.9, 2.4 ms
Timer 'grain size = 4096' took 1.8, 0.5 ms, 1.8, 3.4, 0.9, 2.5 ms

According to the test results, I can say that 2048 will be the best. It starts to show up at 3.6~million point meshes. Yes, 2.4 milliseconds - 200000 cubes and constant overwriting (true all the time)

Also, while the attribute does not know that it is a single, it is impossible to check the correctness of the code. I hope that I'm not mistaken, it may be that someone will correct the allways evaluation of a single attribute as a field and a report will appear on incorrect curve adaptation for single)

Iliya Katueshenock (Moder) planned changes to this revision.Sep 27 2022, 7:40 PM
Iliya Katueshenock (Moder) requested review of this revision.Sep 27 2022, 8:05 PM
Hans Goudey (HooglyBoogly) requested changes to this revision.Oct 4 2022, 12:53 AM

This looks fine to me now. I have a couple comments though:

  • I'm not totally comfortable with the ForAnotherSingle generalization yet. Since Jacques is away this week, do you mind removing that so we can move forward?
  • For the grain sizes, it's not just about making this code as fast as possible, it's about reducing overhead in general so that other work that's more efficient can happen on other domains too. Since we often work on 1024 element chunks for ints/floats, and bools are 1/4 the size, that was my reasoning for 4096. But beyond that intuition of processing a similar amount of memory, I don't have any data.
source/blender/blenkernel/intern/geometry_component_mesh.cc
798–799

Another way to structure this code would be splitting the logic about whether the simplification can be used into a separate function:

bool can_use_early_return_for_single(eAttrDomain to, eAttrDomain from)

I think that would be a bit clearer, and avoid the need for a lambda.

This revision now requires changes to proceed.Oct 4 2022, 12:53 AM
Iliya Katueshenock (Moder) marked an inline comment as done.

I haven't tested this on small domain sizes, but I think that up to 3 million 2k and 4k perform about the same due to memory write speed limitations. And upon reaching this, 2k gets better (I don't know what to say, maybe more tests can solve this, but I don't have much time yet)

This revision is now accepted and ready to land.Oct 4 2022, 10:01 PM

I split two of the changes in this patch to separate commits. Here is the remaining diff:

  • Update with only parallel write part