Page MenuHome

Curve sculpting: Collision constraint solver.
Needs ReviewPublic

Authored by Lukas Tönne (lukastoenne) on Jul 14 2022, 8:10 AM.
Tags
None
Tokens
"Love" token, awarded by jmztn."Love" token, awarded by Dangry."Love" token, awarded by dbystedt."Love" token, awarded by Rusculleda."Like" token, awarded by Crumbled."Like" token, awarded by Moder."The World Burns" token, awarded by Frozen_Death_Knight."Like" token, awarded by TheRedWaxPolice."Burninate" token, awarded by DerivedC."Love" token, awarded by Schamph.

Details

Summary

Uses a basic iterative position solver to ensure that points don't
penetrate the curve surface during editing. Contact points are found
using raycast against the surface. The combing, pinch, and puff brushes
currently have collision support.

The contact constraints are solved for each individual curve (no
interaction between curves). Contact and segment-length constraints are
solved alternatingly in multiple iterations. This is a Gauss-Seidel
method as used in position-based dynamics solvers.

The curve radius determines the offset between curves and the surface,
but currently requires using temporary geometry nodes to set the radius
and then apply the nodes to the base geometry.

Brushes have inconsistent ways of defining the changed curves list. This
requires intermediate arrays for a consistent way to pass changed curves
to the constraint solver. Eventually brushes should use a unified
method of gathering affected curves to make this step unnecessary.

  • Better method for setting curve radius
  • Bring back the single-iteration, unphysical solver method (FLT) for fast editing
  • Move the code to geometry module
  • Add performance test with different thread grain sizes
  • Add performance test with different curve lengths
  • Add performance test with different BVH branching factors (child node count)
  • Ensure correct curve transforms for deformed surfaces
  • Add root-distance constraint to the pbd solver to speed up convergence

Diff Detail

Repository
rB Blender
Branch
sculpt_curve_collisions
Build Status
Buildable 22981
Build 22981: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Lukas Tönne (lukastoenne) requested review of this revision.Jul 14 2022, 8:10 AM
Lukas Tönne (lukastoenne) edited the summary of this revision. (Show Details)

Curve sculpting: Fix crash of constraint solver when surface is null.

I couldn't figure out why yet, but sometimes it works quite well and sometimes it's weird:

Generally, I like the way you implemented it, but have to do a more detailed review when it's working in the case above.

You can use 0.005f as default radius. We do the same in drawing code (see e.g. draw/intern/draw_curves.cc and cycles/blender/curves.cpp).

Jacques Lucke (JacquesLucke) requested changes to this revision.Jul 14 2022, 10:17 AM
This revision now requires changes to proceed.Jul 14 2022, 10:17 AM

I couldn't figure out why yet, but sometimes it works quite well and sometimes it's weird

My mistake, i had tbb disabled. It seems to solve the constraints correctly but for some reason the changes are not applied to the curve positions.

Curve sculpting: Fix threading issues in constraint solver.

The constraint solver gets a span of changed curve indices. The comb
brush collects these in a thread-local vector, which is only valid in
a parallel_for loop.

The BVH storage also needs to be local to the function, otherwise
parallel calls to the constraint solver will free the shared storage
at the end of the task.

Curve sculpt: Make constraint solver parameters configurable.

Default curve radius, max. contact number, and solver iteration count
are now accessible properties of the constraint solver.

  • Since this can make combing significantly slower, there should be a way to turn this off. Maybe another icon next to the symmetry icons?
  • I found that reducing the parallel batch size improves performance for me noticeably. I reduced it from 256 to 64 and that made combing feel much smoother in my test file with ~700 curves.
  • I did manage to get curves into the surface sometimes, can you describe under what circumstances that can happen?
  • Do you have thoughts on how this could be made more efficient in the future? Since most time is spend on the bvh tree lookup, there are probably two main options: Use a faster bvh tree implementation (hopefully we can use embree at some point), and, more importantly, do fewer bvh tree lookups.
  • What is the expected behavior for curves that are on the inside of the mesh, and are combed to the outside?
  • I find a it a bit surprising that contact points detected first. My possibly naive assumption would be that one would have to find the closest point on the surface in every iteration.
  • Why is it necessary to store multiple contact points on the same ray for every point? Shouldn't it be enough to only get the closest one?

Since this can make combing significantly slower, there should be a way to turn this off. Maybe another icon next to the symmetry icons?

No objection.

I found that reducing the parallel batch size improves performance for me noticeably. I reduced it from 256 to 64 and that made combing feel much smoother in my test file with ~700 curves.

Interesting. I picked the number somewhat arbitrarily, but i suppose there is not very much overhead at all to choosing a finer grain size. Should probably also do a bunch of tests with different extremes, like lots of curves, lots of control points, high detail mesh, and see which has the biggest impact.

I did manage to get curves into the surface sometimes, can you describe under what circumstances that can happen?

Yeah i've seen that too, need to dig into this. My current working hypothesis:
The collision is detected by a raycast from the original position to the unconstraint result. This can cut through meshes, which is currently allowed (segment collision could be possible but is out of scope atm).


The segment is too long now, so the constraint solver will scale it back. This can move the point back into the mesh. The "collision" on the way back is not detected as such because the raycast works in the other direction and sees this as a backward face.

Doing collision detection based on the previous position iteration could help here, so the 2nd face is not ignored. Or taking backward faces into account in the first place. It might take a few iterations to satisfy both the length constraint and the collision constraints.

Do you have thoughts on how this could be made more efficient in the future? Since most time is spend on the bvh tree lookup, there are probably two main options: Use a faster bvh tree implementation (hopefully we can use embree at some point), and, more importantly, do fewer bvh tree lookups.

Yes, the collision detection is likely much more expensive than the actual constraint solving. I don't know enough about the details of the BVH trees to make concrete suggestions at this point. One thing i noticed is that the curve tools all seem to use BVHs with branching factor of 2, perhaps a larger factor could be beneficial (more children per node but shallower tree overall). It might also be possible to build an optimized raycast function to return the N closest points, as a middle ground between the closest point or all possible intersections. (see last point on why detecting N contacts is useful)

What is the expected behavior for curves that are on the inside of the mesh, and are combed to the outside?

If a curve starts inside the mesh it should not collide until it moves outside.

I find a it a bit surprising that contact points detected first. My possibly naive assumption would be that one would have to find the closest point on the surface in every iteration.

Yes, this is not ideal atm. New contacts should be detected a few times for each step as an outer loop (that is part of the issue above to detect backward faces). The inner loop is then the actual solving of constraints, using the contacts detected in the outer loop.

An implicit solver would replace the inner loop by constructing a matrix and solving all the constraints simultaneously, while Gauss-Seidel forgoes the matrix construction and pays with slower convergence/less accuracy. It solves one constraint at a time, but that can violate other constraints in turn. So the solver loops over the constraints multiple times until the overall error is small enough.

Why is it necessary to store multiple contact points on the same ray for every point? Shouldn't it be enough to only get the closest one?

This becomes important in cases like concave meshes where a single contact is not enough to move the point to the outside of the mesh. When one of the constraints is solved the other one is still violated and requires another iteration. If only one contact is stored at a time this can mean a lot of unnecessary lookups. If a reasonable set of contacts (whatever that means) can be found in one lookup then the inner loop can find a solution without having to repeat BVH lookups over and over.

Lukas Tönne (lukastoenne) planned changes to this revision.Jul 19 2022, 7:45 AM

The "follow the leader" method of solving for each point in order from the root and then never touching it again is probably incorrect as well. It works for solving the length constraints on their own because the fixed root guarantees a single valid solution that can be found in one iteration. Contact constraints affect both neighboring segments though, the solver loop should iterate over all the affected constraints (i.e. the entire curve). The effect of the current implementation is probably increased "rigidity" of points close to the root and more jittery solutions toward the tip.

Curve sculpt: Improvements and testing for collision constraint solver.

Collision is now optional in the curve sculpting tools and disabled by
default (button next to X/Y/Z symmetry settings).

Moved constraint solver into blenkernel for wider accessibility.

Changed collision method from raycasts to substeps with sphere casts.
This method of finding and resolving contacts is recommended by the XPBD
author. It avoids continuous collision detection (CCD) with raycasts
in favor of using a number of substeps. The range of potential contacts
is limited by the size of the overall allowed step size and number of
substeps.

The length constraints now work in both directions except for the first
segment, which avoids numerical stiffness close to the root.

Constraints now follow a stricter general recipe, which should make it
easier to implement future constraints with more complex constraint
functions and gradients.

The solver now outputs results with overall numerical error metrics and
timings.

Added a basic performance test for the curves constraint solver. The
test is parameterized for over a range of substep counts.

source/blender/editors/sculpt_paint/curves_sculpt_comb.cc
168

This is probably not correct:

  • Doing a plain for loop over the TLS vectors causes memleaks in tbbmalloc
  • Using threading::parallel_for only works up to the grain size (256 curves), after that constraint solver stops working

Could use some advice here.

source/blender/editors/sculpt_paint/curves_sculpt_pinch.cc
145–165

All the brushes using constraints (comb, pinch, puff) have slightly different ways of identifying changed curves, which makes it tricky to define a common interface for the solver. Discussed this with @Jacques Lucke (JacquesLucke), we agreed to just keep this conversion until the code is unified a bit more.

Hans Goudey (HooglyBoogly) requested changes to this revision.Jul 26 2022, 12:40 AM

Pretty cool to see this working! I like how it generalizes to take over the length preservation too. Some notes:

  • I'm not sure if this fits in blenkernel. It's helpful to separate concerns, but it doesn't seem like a "core" thing to me, at least not unless it's abstracted a bit more. Maybe the geometry module?
  • I noticed significantly worse performance even with the "physics" icon turned off here. In a profile I'm noticing a lot of TBB overhead. I'm guessing something isn't configured correctly.
  • Since you've been testing performance, and since performance is important for this, it would be nice to have an overview of what you've learned in that area in the patch description.
  • It looks like this adjusts the curve positions array directly. It probably needs to be updated to support sculpting on deformed curves now, unless I'm missing something.
source/blender/blenkernel/BKE_curves_constraints.hh
111 ↗(On Diff #54027)

Initialize should probably get an IndexMask argument describing the curve selection. Most brushes have built that anyway.
It's nice to keep the cost of as many operations as possible relative to the size of the selection rather than the number of curves

The selection provided here might be larger than the final number of affected curves, that's fine though

source/blender/blenkernel/intern/curves_constraints.cc
84 ↗(On Diff #54027)
86 ↗(On Diff #54027)
100 ↗(On Diff #54027)

dX seems like an odd name for a float3 variable to me. Am I missing something?

110 ↗(On Diff #54027)

Looks like this section could be shuffled a bit and simplified with a couple continue statements

113 ↗(On Diff #54027)
124 ↗(On Diff #54027)
253 ↗(On Diff #54027)

I'm not sure what "Compliance" means here, or why it's zero. Maybe deserves some explanation?

263 ↗(On Diff #54027)
291–293 ↗(On Diff #54027)

How about creating a variable outside of all the lops and using this: for (const int poin_i : points_range.drop_front(skipped_root_points)) {

298 ↗(On Diff #54027)

Use contacts_.as_span().slice(...), and declare the span const

source/blender/blenkernel/intern/curves_constraints_test.cc
23 ↗(On Diff #54027)

The style guide mentions that tests should be in the same namespace as the code they're testing

64 ↗(On Diff #54027)

Not sure CurvesSurfaceTransforms should be duplicated-- can it be reused?

91–93 ↗(On Diff #54027)

Use MutableSpan instead of raw pointers

150 ↗(On Diff #54027)
source/blender/editors/sculpt_paint/curves_sculpt_comb.cc
168

Hmm, I don't observe memory leaks here. I don't have much advice to offer now, but I can look into it more later

source/blender/editors/sculpt_paint/curves_sculpt_pinch.cc
163

There's no need for the virtual array to take ownership of the changed curve indices, VArray<int>::ForSpan should be better here.

Actually though, it looks like all three uses of the solver could just take a Span<int> argument

source/blender/makesrna/intern/rna_curves.c
373 ↗(On Diff #54027)

Clang format

This revision now requires changes to proceed.Jul 26 2022, 12:40 AM
Lukas Tönne (lukastoenne) marked 10 inline comments as done.
  • Fix indexing error: the segment length needs a global point offset.
  • Unit/regression test for the curve length constraint solver.
  • Rename dX to delta_substep.
  • Minor code flow improvements and const usage.
  • Use index mask in solver initialize to limit length calculations.
  • Explanation for the purpose of the (hardcoded) alpha value.

I've tested setting both substeps and iterations of the solver to 1. This results in a single iteration like the previous method, and the performance is comparable.
The main problem is that the new solver works a bit differently: For any segment except the first it will move both points instead of just the 2nd point. This is a more physically accurate approach, but it requires more iterations to conserve segment length. So i would like to provide both methods and use the faster "follow-the-leader" approach by default for editing. For simulating hair physics the more balanced approach is preferable. There is a more in-depth discussion of the problem and possible mitigations for numerical stiffness in this paper https://www.researchgate.net/publication/235259170_Fast_Simulation_of_Inextensible_Hair_and_Fur

Combining the single-iteration FTL method with collisions could be difficult. We could adopt the cloth-sculpt approach and accept that collision can violate length constraints somewhat. That just requires a persistent data layer for segment lengths because we can't simply measure it at the start of the stroke.

Storing the segment lengths might be a good idea in any case to avoid drift: I would expect curves to get a little bit longer with each stroke due to numerical error when accuracy is traded for performance. I can already see this with the more detailed curves in the Einar test file: 5 solver iterations is not enough to fully converge and the hairs (~15 points each) get a bit more stretched out with each stroke. Changing the order of constraint evaluation might improve the speed of convergence too, the GS method can be susceptible to order of constraints.

source/blender/blenkernel/BKE_curves_constraints.hh
111 ↗(On Diff #54027)

I agree. Just some thoughts:

I will allocate a full size array for segment lengths (one per point). This makes it much easier to index and avoids the need for summing up points and offsets for just the selection. Unselected curve segments remain uninitialized.

As mentioned in the discussion on performance and the pros and cons of the previous "follow-the-leader" algorithm, calculating segment lengths ad-hoc from positions could lead to undesirable "drift" during editing. It may be preferable to store a persistent data layer for segment lengths that survives beyond the brush tool application. The segment lengths would just be ignored in tools which deliberately change lengths such as grow/shrink, and then recomputed after such brushes are applied.

source/blender/blenkernel/intern/curves_constraints.cc
84 ↗(On Diff #54027)

The surface mesh can be null, so keeping this as a pointer makes sense imo. Unless std::optional is acceptable here? Jacques said pointer still preferable for that (i agree).

100 ↗(On Diff #54027)

"X" is commonly used in papers to refer to positions, this was just lazy naming. Will rename it to delta_substep for clarity.

253 ↗(On Diff #54027)

I added this more as a reminder for later extension of the algorithm. I've added an inline explanation, but can also remove it if you prefer.

alpha is an "inverse stiffness" parameter that is used in physical simulation to control the softness of a constraint: For alpha==0 the constraint is stiff and the maximum correction factor is applied. For values > 0 the constraint becomes squishy, and some violation is permitted, and the constraint gets corrected over multiple time steps. This does not make a whole lot of sense for an editing tool and unstretchable curves though, so the value is fixed at 0.

Here's a pretty intuitive explanation:
https://youtu.be/jrociOAYqxA?t=668

source/blender/blenkernel/intern/curves_constraints_test.cc
64 ↗(On Diff #54027)

Not sure what you mean. I just needed to initialize this with identities.

150 ↗(On Diff #54027)

Added const where appropriate in these tests, but this particular case won't work: pos is a offset randomly for each point along the curve, it gets modified in the loop below.

source/blender/editors/sculpt_paint/curves_sculpt_comb.cc
168

It's probably not a leak actually. Specifically i'm getting access violations on exit in optimized builds (haven't been able to repro in debug builds yet):

Exception thrown at 0x00007FFB2AEF99D1 (tbbmalloc.dll) in blender.exe: 0xC0000005: Access violation reading location 0x0000025EEC08406C.

source/blender/editors/sculpt_paint/curves_sculpt_pinch.cc
163

Yeah, i first thought i could reconcile the different ways brushes identify changed curves and just wrap them in a VArray, but ended up having to make copies anyway.

What about IndexMask, wouldn't that be the exact use case?

Lukas Tönne (lukastoenne) planned changes to this revision.EditedJul 28 2022, 12:33 PM
This comment has been deleted.

I'm doing some profiling to investigate threading. One thing i noticed is that thread utilization is quite bad in case of the plain outer for loop over the TLS here. The individual thread-local blocks can have wildly different sizes, because which thread a curve gets assigned to depends on its position relative to the brush. Each thread picks a set of 256 curves at a time and filters it by distance. Some threads end up filtering out almost all curves, and some accept hundreds of curves.

This is ok for the initial combing, but doing a straight loop over these and solving each block individually causes very uneven loads. The solver in turn parallelizes over curves, but if the initial set is only a few curves to begin with most threads remain idle. Putting an outer parallel_for loop allows tbb to schedule the next block of curves earlier, but i think it would help even more to recombine the filtered curves.

EDIT
Here's a graph to compare the effect of the simple for loop (blue), the naive parallel_for loop with varying task sizes (red) and a parallel_for after combining all the thread-local changed_curves into a single array. Interestingly the naive parallel loop is actually worse than the sequential loop, but after combining arrays the thread load is much more consistent.

/* Simple sequential loop */
for (auto curves : changed_curves) {
  self_->constraint_solver_.step_curves(
      *curves_orig_, surface, transforms_, start_positions_, VArray<int>::ForSpan(curves));
};

/* Parallel loop with varying task size */
threading::parallel_for_each(changed_curves, [&](const Vector<int> &changed_curves) {
  self_->constraint_solver_.step_curves(
      *curves_orig_, surface, transforms_, start_positions_, VArray<int>::ForSpan(changed_curves));
});

/* Combine TLS curves into a single array (solver parallelizes internally) */
int totcurves = 0;
for (auto curves : changed_curves) {
  totcurves += curves.size();
}
Array<int> all_changed_curves(totcurves);
totcurves = 0;
for (auto curves : changed_curves) {
  all_changed_curves.as_mutable_span().slice(totcurves, curves.size()).copy_from(curves);
  totcurves += curves.size();
};
self_->constraint_solver_.step_curves(
    *curves_orig_, surface, transforms_, start_positions_, VArray<int>::ForSpan(all_changed_curves));

It seems like the solver could receive a single IndexMask of curves to affect. If the brush uses separate stores of changed curves per thread, it should be straightforward to convert from that to a single IndexMask. Plus it means the rest of the brush probably has similar utilization problems anyway.

BLI_index_mask_ops.hh is a somewhat relevant header.

source/blender/blenkernel/intern/curves_constraints.cc
84 ↗(On Diff #54027)

Oh, I just didn't realize it could be null, thanks.

source/blender/blenkernel/intern/curves_constraints_test.cc
64 ↗(On Diff #54027)

Oops, my bad!

Testing collision detection for different mesh sizes and BVH branching factors: As expected the detection time grows more or less logarithmically. The majority of time is spent in the broadphase bounds checks, with a tree depth of log_b(n), where b is the BVH branching factor. The brushes currently use a branching factor of 2. The sweet spot seems to be 4 (which makes sense for surface meshes), but the overall effect of branching factors isn't very large.

For improving collision performance i think we can only expect minor improvements by optimizing the BVH. To make collision a viable feature in editing and realtime animation there are a few options that come to mind:

  1. Use simpler meshes. The "Einar" beard has about 22k curves with 440k control points and the surface mesh has 140k triangles. This kind of high detail mesh isn't needed for reasonable collision accuracy, it's common practice to have a lower-detail mesh for collisions. Big question is how that could be integrated into the curve editing workflow, without collision and visual mesh getting desynced and without too much mental overhead for artists.
  2. Use simpler curves. Probably not an option for initial hair styling (we want all the detail there), but could be useful for later animation and physics. Calculating collisions on guide curves or softbody cages would be a lot cheaper than colliding every single hair, especially when self-collision is a goal.
  3. Avoid the BVH: A possible option is to use a SDF volume grid, which allows very fast lookups and should provide enough accuracy (mentioned in "Fast Simulation of Inextensible Hair and Fur"). The cell size should be based on the max. collision distance per substep, much like the BVH lookup distance is now. This should be cached as much as possible.
  • Added back the simple sequential distance solver as a editing method.
  • Use IndexMask to indicate changed curves instead of a VArray.
  • Deduplicate code for solvers.
  • Improvemed thread utilization by combining curve lists in comb brush.
  • Moved constraint into geometry module and separate perf test lib.
  • Simplified performance tests by using a common template.
  • Perf test for BVH tree branching factors.
  • Perf test for different collision mesh resolutions.
  • Computing residual error is now an optional step.

The collision detection performance increases substantially with more substeps. The reason is that the number of near-phase triangle checks decreases with the size of the search radius, which is in turn determined by position delta per substep ("velocity"). There is a certain optimum when the search radius gets smaller than the average mesh triangle, beyond which additional substeps start increasing the overall cost.

Here i'm comparing a simple torus (20000 faces) with the "Einar" test file (combing the beard). For Einar the optimum seems to be around 60 substeps, at which point the performance becomes almost usable for editing.

The optimum depends on the "average" size of triangles on the mesh, which could make it difficult to calculate reliably. Since the cost rises only slowly with additional substeps it may be preferable to choose a default that is too large vs. one that is too small. It might be possible to adjust substeps automatically, but this could backfire as well, e.g. if triangle density is uneven. I'd like to compare this to the SDF method, which is independent from mesh topology.

FYI, I'm currently merging master in this patch and check if we can make the initial functionality master ready for Blender 3.5. I might strip it down a bit for that, while keeping the general API in place to allow for more complex/better implementations in the future.

The main thing that changed compared to when you were working on this patch originally is that we now want to support workflows where the user is just grooming guides and children hair is generated. Hence, performance is a bit less of a problem.

FYI, I'm currently merging master in this patch and check if we can make the initial functionality master ready for Blender 3.5. I might strip it down a bit for that, while keeping the general API in place to allow for more complex/better implementations in the future.

Great! There is room for improvements but it should work as a first step.

The main thing that changed compared to when you were working on this patch originally is that we now want to support workflows where the user is just grooming guides and children hair is generated. Hence, performance is a bit less of a problem.

Yes, i've been thinking about formalizing the concept of "physics proxies". We can't really expect realtime collision handling with 100k+ points and dense (10k+) meshes, even if the BVH tree gets an order of magnitude faster. Guide hairs are one way to "proxify" hair physics and decouple the physics performance from visual detail. Cage meshes or "embedding" in specialized softbodies would be another.

Hi! I just wanted to throw in a comment regarding surface collision when grooming. It would be very useful to have the option of defining a custom collision collection.

The character could for instance have a coat, scarf or similar that needs to be included in the collision. This would also enable the user to voxelize/remesh the collision geometry for optimization purposes.

Another common approach is to use a "hair cap" for optimizing. A hair cap is a single sided mesh that is only used for attaching hair and is usually skinned to the armature. The head geometry can potentially be very dense (if it requires complex shape keys) so it's nice to use a hair cap, since it does not require the same polygon density.