Fixes T97685
Over large ranges small floating point errors can accumulate in summation.
Kahan summation reduces this error by tracking accumulated errors in a
compensation variable.
Details
- Reviewers
- None
- Maniphest Tasks
- T97685: Attribute Statistic Node Outputting Weird Mean.
Diff Detail
- Repository
- rB Blender
- Branch
- geometry-nodes-stats-kahan-sum (branched from master)
- Build Status
Buildable 22500 Build 22500: arc lint + arc unit
Event Timeline
Quite an interesting technique indeed! Though it does have downsides.
Fast-math, which I _think_ is only used in Cycles today, negates this algorithm completely: https://godbolt.org/z/na69o1194
And the accuracy comes at quite an extreme cost to the tune of 4 to 4.3x slower runtime perf:
BENCH_accum_int_stdaccum 8775 ns 8580 ns 74667 BENCH_accum_float_stdaccum 101065 ns 100098 ns 6400 BENCH_accum_float2double_loop 101232 ns 102534 ns 7467 BENCH_accum_int_kahan 32235 ns 32227 ns 21333 BENCH_accum_float_kahan 433861 ns 429688 ns 1600
The 'int' variant of compute_sum can be enclosed within an if constexpr (std::is_integral_v<T>) scope to return its perf back to normal at least (done in the godbolt link).
I included a 'dirty' variation where we just cast to double for compute_sum and that gets you most of the way there with little perf decrease but it's not as accurate as Kahan.
You are right, the accuracy is paid for with runtime cost. So maybe just using doubles internally is the better all-round option. Otherwise a user option for "increased accuracy" that switches between simple and Kahan summation could work? Generally i wouldn't expect the statistics node to be the main issue when working with millions of elements, but still worth considering. Also this node is not at all parallelized in either variant yet, which would have slightly worse accuracy.
The 'int' variant of compute_sum can be enclosed within an if constexpr (std::is_integral_v<T>) scope to return its perf back to normal
The statistics node currently only has float and float3 summation. In future i guess there will be an integer mode though.
Not sure what you mean by "Fast-math negates this algorithm completely". Do you mean fast math makes it impossible to use this algorithm?
Oh i see now, it has the same accuracy problem with fast-math enabled as the naive summation. That's good to know, have to try and understand why that is. Aggressive compiler optimization?
Confirmed, fast-math in particular enables -fassociative-math, which will optimize away the (t - sum) - y term and make the compensation become zero.
Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like (x + 252) - 252. May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required