Page MenuHome

Use Kahan summation for attribute statistics to reduce accumulated error.
Needs ReviewPublic

Authored by Lukas Tönne (lukastoenne) on Jun 11 2022, 7:44 PM.
This revision needs review, but there are no reviewers specified.

Details

Summary

Fixes T97685
Over large ranges small floating point errors can accumulate in summation.
Kahan summation reduces this error by tracking accumulated errors in a
compensation variable.

Diff Detail

Repository
rB Blender
Branch
geometry-nodes-stats-kahan-sum (branched from master)
Build Status
Buildable 22500
Build 22500: arc lint + arc unit

Event Timeline

Lukas Tönne (lukastoenne) requested review of this revision.Jun 11 2022, 7:44 PM

Quite an interesting technique indeed! Though it does have downsides.

Fast-math, which I _think_ is only used in Cycles today, negates this algorithm completely: https://godbolt.org/z/na69o1194

And the accuracy comes at quite an extreme cost to the tune of 4 to 4.3x slower runtime perf:

BENCH_accum_int_stdaccum            8775 ns         8580 ns        74667
BENCH_accum_float_stdaccum        101065 ns       100098 ns         6400
BENCH_accum_float2double_loop     101232 ns       102534 ns         7467

BENCH_accum_int_kahan              32235 ns        32227 ns        21333
BENCH_accum_float_kahan           433861 ns       429688 ns         1600

The 'int' variant of compute_sum can be enclosed within an if constexpr (std::is_integral_v<T>) scope to return its perf back to normal at least (done in the godbolt link).

I included a 'dirty' variation where we just cast to double for compute_sum and that gets you most of the way there with little perf decrease but it's not as accurate as Kahan.

You are right, the accuracy is paid for with runtime cost. So maybe just using doubles internally is the better all-round option. Otherwise a user option for "increased accuracy" that switches between simple and Kahan summation could work? Generally i wouldn't expect the statistics node to be the main issue when working with millions of elements, but still worth considering. Also this node is not at all parallelized in either variant yet, which would have slightly worse accuracy.

The 'int' variant of compute_sum can be enclosed within an if constexpr (std::is_integral_v<T>) scope to return its perf back to normal

The statistics node currently only has float and float3 summation. In future i guess there will be an integer mode though.

Not sure what you mean by "Fast-math negates this algorithm completely". Do you mean fast math makes it impossible to use this algorithm?
Oh i see now, it has the same accuracy problem with fast-math enabled as the naive summation. That's good to know, have to try and understand why that is. Aggressive compiler optimization?

Confirmed, fast-math in particular enables -fassociative-math, which will optimize away the (t - sum) - y term and make the compensation become zero.

Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like (x + 252) - 252. May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required