Page MenuHome

Sculpt: Change PBVH leaf limit size
AbandonedPublic

Authored by Pablo Dobarro (pablodp606) on Aug 1 2020, 12:53 AM.

Details

Summary

A lot of users are reporting very poor stroke performance in Sculpt Mode
with high end processors.
In a demo file @Julien Kaspar (JulienKaspar) send me, I notice that the PBVH is creating
only one leaf node for 49398 vertices in Multires level 1. I believe
that what is causing that lag is that when a brush that has low spacing
is used with a small radius a lot of stroke steps are generated per
viewport update. As there is only one node, these steps can't be
multithreaded and they all loop over the 50k vertices, making the brush
lag.
When reducing the leaf limit to 1000, the PBVH has 16 nodes for the same
number of vertices, which increases performance a lot.
I'm not sure when was the last time the leaf limit was updated and if
the current 10k leaf limit is still acceptable today with 16 and 32
cores CPUs available, but if we can agree on a better default for most
user maybe we can expose it as a setting (similar to how "Use Threaded
Sculpting" was in the UI)
About the decission of making it 1000 by default, it is just based on
some testing I did in my computer. This patch should be tested in as
many CPUs as possible in case we want to include only one default
instead of exposing it as a setting.

Here is a comparison table measuring 130 samples of the time spent
in the paint_stroke_modal function for the same brush size at
different multires levels:

On a i7 9750H, 16GB, GTX 1650 4GB:

Average:

493988187822897118
10k leaf limit0.0040757518796990.0034466390977440.003298706766917
1k leaf limit0.000681300751880.0020790.006291
Improvement5.982309381656063.515495226043941.40370949835066

Maximum:

493988187822897118
10k leaf limit0.008720.0067260.008588
1k leaf limit0.0019490.0020790.006291
Improvement4.474089276552083.235209235209231.36512478143379

Diff Detail

Repository
rB Blender
Branch
pbvh-leaf-size (branched from master)
Build Status
Buildable 9366
Build 9366: arc lint + arc unit

Event Timeline

Pablo Dobarro (pablodp606) requested review of this revision.Aug 1 2020, 12:53 AM
Pablo Dobarro (pablodp606) created this revision.

Please always include units and system specs when sharing benchmarks. But seems like 4x speedup for oneliner? Impressive! Can we fix 2 more lines and have 64x speedup :)

I think this is the proper thing to do (limit leaf size to something more adequate). I'm wondering, will lowering limit to 100 improve the performance further or will it start to degrade the performance?

Lowering this value has a few potential performance issues:

  • Slower GPU rendering due to rendering more batches.
  • Increased CPU and GPU memory usage due to more shared/duplicated vertices between leaf nodes.
  • Higher CPU threading overhead.

So, for measuring performance you not only need to look at this existing failure case, but also check that on models with millions of polygons the GPU rendering and memory usage are still ok. For both multires and regular meshes.

I think 100 will definitely be too low. It means if you have 10 million polygons, you will have 100k GPU batches, which is too many. If we need that fine-grained scheduling for brush operations, we could look at storing GPU batches at inner nodes to cover multiple leaf nodes.

The current limit is set to 10k, but there are still 49398 vertices in a leaf node? How does that happen?

The current limit is set to 10k, but there are still 49398 vertices in a leaf node? How does that happen?

Because in Multires it is using the grids face count instead of vertex count to limit the leaf nodes, so in level 1 there will be always 4 vertices per face. I can change that, but @Julien Kaspar (JulienKaspar) told me that even with this patch the performance in his computer is still the same, so maybe the issue is not entirely related to this.
Also, dyntopo is already using a leaf limit of 100, not sure if there is a specific reason for that.

Because in Multires it is using the grids face count instead of vertex count to limit the leaf nodes, so in level 1 there will be always 4 vertices per face. I can change that, but @Julien Kaspar (JulienKaspar) told me that even with this patch the performance in his computer is still the same, so maybe the issue is not entirely related to this.

I think it is good to use vertex count instead, so that leaf nodes have approximately the same number of vertices regardless of the multires level. Otherwise performance will probably change too much depending on the multires level.

And then on top of that you can tweak the limit for best performance.

Also, dyntopo is already using a leaf limit of 100, not sure if there is a specific reason for that.

I'm not sure either, it seems too low to me regarding GPU and threading overhead.

Pablo Dobarro (pablodp606) edited the summary of this revision. (Show Details)
  • Change leaf limit to 5000

I update the leaf limit to a more reasonable number after D8454. I still get a consistent performance improvement after this patch. Maybe we can try to commit this to 2.91 and see if it causes other performance problems.

Even this can improve performance in some cases, this should be part of a bigger set of changes (supporting buffers in intermediate nodes, abstracting the brush actions from the scheduler) in order to be properly fixed.