Page MenuHome

Selection input for curve trim node and behavior corrections
ClosedPublic

Authored by Mattias Fredriksson (Osares) on Oct 5 2022, 10:30 PM.

Details

Summary

This patch is intended to adjust some comments for the trim implementation:

Corrects trim for cyclical curves mentioned in T101379, splitting the curves if the start/endpoint is at the 'loop point'.

Corrects implementation based on comments in D14481, request was made to use 'foreach_curve_by_type' to computing the point lookups.

Included corrections found in D16066 as it may not be a adopted solution.

Exposed selection input by adding it as input to the node.


Above changes implied some cascading effect to the implementation:

  • IndexRangeCyclic was transformed to using 32 bit integers and no longer works within a subinterval of some buffer using range_size_ to determine the size of the underlying buffer. No longer tracking the start/end of the underlying buffer. While these features where the memory consumption (40 bytes, 320 bits) to store an entry had a noticeable performance impact in the trim implementation as it was changed to compute entries once. Since the primary purpose is to use it for curves which is limited to a buffer size determined by a 32 bit integer, reducing the memory usage of an entry to fit in a 128 bit WORD had a performance impact.
  • Computing the curve size computation is parallelized and is combined with the point lookup computations in 'compute_curve_trim_parameters()'. The solution for avoiding computing both trim lookups if equivalent should be slightly clearer.
  • Added 'foreach_curve_by_type_mask()' to avoid recomputing the curve selection based on type. Selection for single point splines is tracked and updated after 'compute_curve_trim_parameters()'.

Performance for large number of Bezier/Catmull curves is slightly improved after mentioned adaptations. Uncertainty in the measurements however.


Interesting feedback:

Should selection input be reverted/ moved to a separate diff?

Wouldn't say this is an improvement to the code quality as it removes functionality that would be reusable.

Diff Detail

Repository
rB Blender

Event Timeline

Mattias Fredriksson (Osares) requested review of this revision.Oct 5 2022, 10:30 PM
Mattias Fredriksson (Osares) created this revision.
Mattias Fredriksson (Osares) retitled this revision from Changes to WIP: Selection input for curve trim node and behavior corrections .Oct 5 2022, 11:04 PM
Mattias Fredriksson (Osares) edited the summary of this revision. (Show Details)
Mattias Fredriksson (Osares) planned changes to this revision.Oct 6 2022, 12:14 AM
Mattias Fredriksson (Osares) retitled this revision from WIP: Selection input for curve trim node and behavior corrections to WIP: Selection input for curve trim node and behavior corrections.
Mattias Fredriksson (Osares) edited the summary of this revision. (Show Details)
  • Corrected behavior for transfering attributes for copied (unselected) splines
  • Added consts

This looks like a nice change. To be honest I still haven't put the time into totally understanding IndexRangeCyclic, so that part is going over my head a bit, but what you say sounds like an improvement.
I hope using foreach_curve_by_type is working out okay. To me it seemed like a nice way to separate concerns and make things more scalable (and maybe to make better use of instruction caches?), but sometimes it takes a bit of change to get things to fit with that paradigm.

Should selection input be reverted/ moved to a separate diff?

In a perfect world things would be split up a bit more to separate cleanups, performance improvements, and functional changes, but I understand batching things together can help with debugging and make changes faster. It's also easier for me to say with commit rights, cleanups have more overhead when going through patches. So I'm fine with the way it is.

Wouldn't say this is an improvement to the code quality as it removes functionality that would be reusable.

What is the feature being removed? The choice of what to do at the start/end points?

source/blender/blenkernel/BKE_curves_utils.hh
563

I think std::array like is used for the type counts would be a slight improvement in this area, so it doesn't decay to raw pointers.

source/blender/geometry/intern/trim_curves.cc
895

If there is only one curve type, wouldn't building the second mask per type have a performance impact?

If we're able choose, I think it's better to compromise the multi-type performance that the single-type performance.
The curve type count cache and the IndexMask special cases for is_range() and is_empty() can usually help avoid extra work.

904

Is single_curve_count redundant with updated_curve_point_indices.size()?

960

I think it's not changing in this patch, but in theory we could avoid creating/writing to curve types completely in some cases (when all the curves don't change type). That would allow making use of future CoW improvements so that the attribute data is shared.

Maybe I'm prematurely optimizing, but I think it's good to keep things in mind.

Mattias Fredriksson (Osares) marked 4 inline comments as done.
  • invert_submask()

This looks like a nice change. To be honest I still haven't put the time into totally understanding IndexRangeCyclic, so that part is going over my head a bit, but what you say sounds like an improvement.
I hope using foreach_curve_by_type is working out okay. To me it seemed like a nice way to separate concerns and make things more scalable (and maybe to make better use of instruction caches?), but sometimes it takes a bit of change to get things to fit with that paradigm.

There is nothing really to it, it's just an IndexRange where you can define the index you want to start iterating at and the index to end the iteration + a loop counter (it's an abstraction over iterating from any control point on the curve, avoiding considering if the next one should be the last entry in the buffer).

With regards to the for_each_curve() it is nice but think it might be possible to refine it as it is a bit awkward to always have to define 4 lambda functions to get it to work. Feels like it breaks up the function and make it harder to read/understand and it would be better if one could define them outside the function as the code is duplicated in each. So even if the code duplication is acceptable in this case, defining them within the function is more of a problem as they can take quite a bit of space.

Wouldn't say this is an improvement to the code quality as it removes functionality that would be reusable.

What is the feature being removed? The choice of what to do at the start/end points?

No just combined multiple functions into the compute_curve_trim_parameters() which has no reusability which is a slight degradation I think. Did experiment with allowing negative values in the start/end input but it gets a bit weird as you couldn't just split the curve by setting 'end = start + curve length' due to floating point inaccuracies so I went back to thinking the Iliya approach of a separate output for the 'outer' or 'other' curve is better again (they are discrete sets so there is no FP ambiguity). But I guess that has to be another time since people seemed to dislike the idea, will focus on Bisect node instead but will probably have to take a from doing this first as it can get a bit intense doing it in the spare time.

source/blender/geometry/intern/trim_curves.cc
895

No the masks are only updated if there are actually any changes to the mask (i.e. a curve is trimmed to a single point). But I made it more explicit by following the idea of ignoring edge cases to benefit the general case by only tracking the single point curves and implementing IndexMask::invert_submask() instead.

904

It wasn't but exposed is_empty() in threading::EnumerableThreadSpecific which achieves the same result.

960

It would be interesting if CustomData could reference data in other CustomData buffers rather then copying everything. Ensuring all attributes are set properly is not simple even now though (it gets a bit tricky ensuring all curve specific attributes are set/removed properly). So hopefully such a system would not introduce more edge cases. I guess edge cases would be reduced if one avoids 'evaluated' copies.

Mattias Fredriksson (Osares) retitled this revision from WIP: Selection input for curve trim node and behavior corrections to Selection input for curve trim node and behavior corrections.Oct 8 2022, 7:31 PM
  • Corrected start/end logic
  • Rebased
Hans Goudey (HooglyBoogly) requested changes to this revision.Oct 14 2022, 8:15 PM

This looks close, just a couple things:

An issue:

  • This patch changes the behavior when the start is greater than the end, which just generated a single point before.

A couple of questions:

  • Did templating compute_curve_trim_parameters with the curve type give a performance improvement?
  • What is the benefit of building a separate vector for indices of curves that end up as a single point, rather than just doing it conditionally in the general functions?

And some warnings:

/home/hans/blender-git/blender/source/blender/blenkernel/BKE_curves_utils.hh:87:40: warning: unused parameter ‘iterable_range_start’ [-Wunused-parameter]
   87 |                              const int iterable_range_start,
      |                              ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/home/hans/blender-git/blender/source/blender/geometry/intern/trim_curves.cc:476:17: warning: unused variable ‘dst_size’ [-Wunused-variable]
  476 |   const int64_t dst_size = dst_range.size();
      |                 ^~~~~~~~
source/blender/geometry/intern/trim_curves.cc
748
896–897

Do you really have to write out all the IndexMask default constructors?

This revision now requires changes to proceed.Oct 14 2022, 8:15 PM
Mattias Fredriksson (Osares) marked an inline comment as done.
  • Implicit initialization
  • Resolved: non-cyclic curves with end < start not turning to point, single point curves not being sampled due to adding wrong variable, nasty rounding error due to not using parantheses when computing integer difference before adding float value in Bezier curve lookup
  • Added asserts verifying non-point trim curves has more then 1 point
  • clamp_f -> std::clamp

A couple of questions:

  • Did templating compute_curve_trim_parameters with the curve type give a performance improvement?
  • What is the benefit of building a separate vector for indices of curves that end up as a single point, rather than just doing it conditionally in the general functions?

In part it's a bit of experiment on my end because there is not much room for optimizing the actual trim function (there is likely different options but it gets complicated with the generic attributes). Removing the edge cases however (i.e. separate the 'point curves, and plausibly cyclic curves) is an quite obvious one since it removes possible branching and thus a few comparisons. So in essence the implementation follows the idea in one of your comments 'better to compromise the multi-type performance that the single-type performance'. Essentially, by separating the edge cases we remove a few instructions and branches from the general case, plausibly compromising performance in the edge cases to benefit the general case.

While we are not talking about a huge difference here (roughly the VS profiler attributed at most ~0.5% of execution time to a single branch statement). We are not really compromising performance in other cases by doing so either. For the cases there is a compromise the code avoids any extra work in the general case (e.g. if no curve is trimmed to a single point, we know the selection for that curve type remains the same so we only recompute the selection if there is such as case). While recomputing selection is not free, particularly as it's hard to thread parallelize, the branching for single point curves is likely worse because it's done not N times (per curve) but N * A times (for every curve and every transferred attribute). So rather then loosing performance from extra compute the loss by my guess would be due to cases not being run in parallel with each other. There are of course other issues such as increased binary size and code duplication as well, which could be used to justify not doing it this way.

But yeah, it's not really until this point I can actually test it in practice because I've usually been changing multiple things at a time so the branches haven't been comparable. And it's also quite time consuming to do so it will probably take a while until I can test if separating the single point curves is an improvement when both cases are present in the input (as mentioned previously the general case has no real downside). But to give some data for the case with templating the compute_curve_trim_parameters<>() function (most of the benefit is from providing a constexpr for lookup_curve_point<>()) I ran some tests today to verify there being a benefit to it.


Test 1:

328K Poly splines, 3.3 M points in total, 1000 samples

With template:
Mean: 0.024579634000000003 (s)
Std: 0.0018933722069482272 (s)

Single loop (read value from curve_types() attribute, no template):
Mean: 0.025603625 (s)
Std: 0.001158648 (s)


Test 2:

524K splines (mix of Poly and Bezier), 2.1 M points in total, 1000 samples

With template:
Mean: 0.0390466 (s)
Std: 0.0010852 (s)

Single loop (read value from curve_types() attribute, no template):

Mean: 0.039003626 (s)
Std: 0.000688097 (s)


Conclusion

There is some indications that the templated version performs better as it was ~1 ms faster on average when only passing in a single spline type. When mixing splines the results are roughly identical on average (~0.04 ms difference). While there is some clear improvements, due to the large standard deviation it's not clear if the difference is significant (which implies changing the implementation would likely be less useful then keeping the current one, hurrah!). Will re-run the test later but takes quite a while since the Python script is pretty bad and uses the UI and wait + redraw....


General method:

Trim Start/End factor: 0.1-0.9

Ran tests in sequence to get roughly the same condition (but my computer is pretty trash for running tests due to bloat, lets call it 'testing under real world conditions')

Repurposed old test files to do the test (they are not specifically created for these tests ;)

Gathered samples using:


and

source/blender/geometry/intern/trim_curves.cc
748

Supposed C++ std lib should be preferred over the internal C functions

896–897

Seems not so. Always room for understanding C++ better (or re-remember)!

Thanks for the thorough testing. Your conclusions make sense to me, as does your reasoning for prioritizing the common case rather than the single point "edge case".

The way I usually test timings is using the SCOPED_TIMER_AVERAGED timer in BLI_timeit.hh. Then I add a "Scene Time" node which forces the tree to recalculate when the frame changes, and play the animation. Maybe that would save some time in the future.

This revision is now accepted and ready to land.Oct 17 2022, 6:00 PM

Thanks for the thorough testing. Your conclusions make sense to me, as does your reasoning for prioritizing the common case rather than the single point "edge case".

The way I usually test timings is using the SCOPED_TIMER_AVERAGED timer in BLI_timeit.hh. Then I add a "Scene Time" node which forces the tree to recalculate when the frame changes, and play the animation. Maybe that would save some time in the future.

Sounds like an improvement will try it out next time.

Made another test run which wasn't as pretty:


Test 1:

328K Poly splines, 3.3 M points in total, 1000 samples

With template:
Mean: 0.024431501 (s)
Std: 0.0014225080948799553 (s)

Single loop (read value from curve_types() attribute, no template):
Mean: 0.024458776 (s)
Std: 0.0014862949356786491 (s)


Test 2:

524K splines (mix of Poly and Bezier), 2.1 M points in total, 1000 samples

With template:
Mean: 0.038799 (s)
Std: 0.000743 (s)

Single loop (read value from curve_types() attribute, no template):

Mean: 0.039009 (s)
Std: 0.0007 (s)


Conclusion

It's likely the 1 ms gain is influenced by external factors as it was only ~0.03 ms faster this time. On the positive side the templated one was faster in both tests this time, beating the non-templated one for mixed curves by more then 0.2 ms! The interesting part is that the deviation is close between the tests which makes the tests seem more reliable. Another interesting note is that the deviation for the mixed test is also half the deviation for the poly curve test which is odd as it is both a longer test and includes more uncertainty (more threaded loops due to mixed types), I suppose it might be due to the length computation for the Bezier curves being the dominant compute task.

But yeah tl;dr performance gain is minimal but unlikely to perform worse in comparison to the alternative.

I think I accepted slightly prematurely before, I noticed a few small things when looking to commit this.

Out of curiosity, I tested the binary size with the template and without it:

Template:    326,973,655 bytes
No Template: 326,938,727 bytes

Assuming the build was completely reproducible otherwise (a fair assumption, right?), the template costs 35 KB. I'm sure there are areas that are much worse, but considering it doesn't really have a performance impact, best to skip the templating for now IMO.

source/blender/blenlib/BLI_index_mask.hh
237 ↗(On Diff #56835)

The by-value const has no effect in a definition anyway. Might be nice to add a docstring here, and note the runtime (since it's not obvious how the size of the two masks influence it)

source/blender/geometry/GEO_trim_curves.hh
20–23

A bit nitpicky, but it's in the style guide so we might as well be consistent

Mattias Fredriksson (Osares) marked 2 inline comments as done.
  • Removed treating single point as a separate case
  • Made IndexRangeCyclical utility constructors static functions of the class itself, debugged and corrected odd or faulty behaviors
  • Comments and asserts
  • Removed time sampler

Sorry for not updating the patch faster, removed the template parameters a couple of weeks ago but didn't have time to test it. Ended up simplifying the implementation further, avoiding treating curves trimmed to a single point as a separate case, which is close to being a refined version of a previous version of the code. Wanted to measure the difference at a later point in time but realized with the performance likely being small I assumed this would be the preferred version so decided it was best to get it done so to avoid retesting everything once more (and I kind of lost track of what I changed earlier).

Some performance measurements for the same tests as previous (and same state in master)


Test 1:

328K Poly splines, 3.3 M points in total, 1000 samples

Without single point edge case:
Mean: 0.02416978
Std: 0.001631

Mean: 0.025654781
Std: 0.0014802774365094537

Mean: 0.024859714
Std: 0.0018068348580332404

Mean: 0.024282189000000003
Std: 0.001583815206164848


Test 2:

524K splines (mix of Poly and Bezier), 2.1 M points in total, 1000 samples

Without single point edge case:

Mean: 0.038496302999999996
Std: 0.0014545333193815122

Mean: 0.038460719000000004
Std: 0.0009616918373569571

Mean: 0.037224986
Std: 0.0025562260713411092

Mean: 0.038420722
Std: 0.001136980463647463


Summary

The only clear difference is that the second test is slightly faster 0.5-1 ms, which means it performs slightly better when there are curves mixed in which is trimmed to a single point. For the simpler case there is no clear difference but the result also seem to vary quite a bit. Due to the simplified implementation I think the plausible improvement of a fraction of a millisecond is not worth it.

Side notes

Some methods extending the IndexMask is no longer used, kept them in case you still find them relevant. Similar goes for the foreach_curve_by_type_mask() functions.

Implemented some small unittests to verify IndexRangeCyclic functions, not sure if I should provide a separate patch for them.

Hopefully the BF conference was fun, trying to go through it when I can.

BR
Mattias

source/blender/geometry/GEO_trim_curves.hh
20–23

Yeah I like the style in theory but I always copy paste the definition to make sure I don't make typos in it which breaks the style by default. Would be nice to have some code format/linter to check for these things.

  • Comments, removed unnecessary includes

Thanks for the cleanups! I pushed this to 3.4, but only enabled the selection input in 3.5 since we're in Bcon3.

I left out some of the unnecessary changes to IndexMask since AFAIK those are usually added on-demand.