Page MenuHome

Fix T82455: precision issues and bug in vec_roll_to_mat3_normalized with axis close to -Y
ClosedPublic

Authored by Alexander Gavrilov (angavrilov) on Nov 13 2020, 11:06 AM.

Details

Summary

When the input vector gets close to -Y, y and theta becomes totally
unreliable. It is thus necessary to compute the result in a different
way based on x and z. The code already had a special case, but:

  • The threshold for using the special case was way too low.
  • The special case was not precise enough to extend the threshold.
  • The special case math had a sign error, resulting in a jump.

This adds tests for the computation precision and fixes the issues
by adjusting the threshold, and replacing the special case with one
based on a quadratic Taylor expansion of sqrt instead of linear.

Replacing the special case fixes the bug and results in a compatibility
break, requiring versioning for the roll of affected bones.


This contains 3 commits - one adding tests (some deliberately failing), one is the original threshold adjustment by Gaia Clary to avoid NaNs, and the last one is my more involved fix for the issues plus more tests.

Diff Detail

Repository
rB Blender
Branch
temp-angavrilov-roll-precision (branched from master)
Build Status
Buildable 15588
Build 15588: arc lint + arc unit

Event Timeline

Gaia Clary (gaiaclary) requested review of this revision.Nov 13 2020, 11:06 AM
Gaia Clary (gaiaclary) created this revision.
Gaia Clary (gaiaclary) edited the summary of this revision. (Show Details)Nov 13 2020, 11:10 AM

Adjusted comment in unit test

Gaia Clary (gaiaclary) added inline comments.
source/blender/blenkernel/intern/armature.c
2270–2271

We could just test theta_alt > THETA_CRITICAL but then one of the existing unit tests will fail. I adjusted the value of THETA_ALT by experimenting a bit until all tests pass again. I am not sure if this is an acceptable solution

Sybren A. Stüvel (sybren) requested changes to this revision.Nov 16 2020, 10:47 AM

Note: the bug fix has been made such that all existing unit tests pass.

That looks like a good start to me.

But this made it necessary to add an additional constant THETA_ALT. We could avoid the extra constant but then we would also need to adjust the unit tests.

In what way do they fail? Is it because there is actually a bug, or because of some slightly different rounding causing some slightly different numbers?

source/blender/blenkernel/intern/armature.c
2250

A reader who's here in the function won't know what "alt" means.
As I also said in D9410#233546, I'm not much of a fan of just using a Greek letter without explaining what it is. For this constant, it's now also unclear that it's a threshold value like the rest.

2250

THETA_ALT is much smaller than FLT_EPSILON, which is typically defined as 1.192092896e-07F. Does the check on THETA_ALT need to be that strict? Or would it work with FLT_EPSILON as well? In the latter case, you could drop THETA_ALT completely, and just directly compare to FLT_EPSILON. That would also trivially resolve the other note.

This revision now requires changes to proceed.Nov 16 2020, 10:47 AM

Changed the names of the Threshold constants so that they can easily be identified as thresholds.
Added some more comments to clarify

source/blender/blenkernel/intern/armature.c
2250

THETA_ALT is much smaller than FLT_EPSILON, which is typically defined as 1.192092896e-07F. Does the check on THETA_ALT need to be that strict? Or would it work with FLT_EPSILON as well? In the latter case, you could drop THETA_ALT completely, and just directly compare to FLT_EPSILON. That would also trivially resolve the other note.

2250

using FLT_EPSILON as threshold value might be another choice, but then we need to decide if the function should be doing what the tests want, or the tests need to be adjusted to what the function does :)

Right now the function is ruled by the tests and the function actually seems to behave correct by all means. Also maybe the renaming of the constants and adding the new comments makes it more clear what happens ?

If this is still not good enough, then please help me to find a good way to finish this patch so that not only the bug is fixed, but also you are happy with the inner design of the function :) I have no more ideas how to improve this for now.

I have found another precision issue with this function when debugging IK-FK snapping in the new Rigify paw rig - see inline comment.

source/blender/blenkernel/intern/armature.c
2279–2280

The 1e-5 threshold isn't actually safe here: nor = (0,-0.999978,-0.00659052) taken from a real Rigify rig causes the function to return a matrix with 0.7% scale in it, because z*z/theta > 2. The threshold needs to be increased to e.g. 1e-4 to drop scaling below the detectible threshold (the actual theta in this case is 2.2e-5).

Since I found a related issue of my own, I decided to investigate this math thoroughly.

I started by splitting the huge test into parts: you should not test different cases
under one 'test', because it makes it difficult to determine what exactly failed.
I also added tests for orthogonality of the produced matrix with precision thresholds.

While experimenting with the thresholds to minimize the precision errors I realised
that it is possible to unify the special handling for the vicinity of the singularity
by simply using a different method to compute theta based on x and z instead of y.

This however revealed via failing test an actual bug in the code: the original special
case math used the wrong sign for bMatrix[2][0] and bMatrix[0][2]. This is definitely
a bug, because it causes an abrupt jump in the bone orientation when you smoothly change
its parameters via the UI, while the new math behaves smoothly.

This means that fixing this issue properly would require versioning to adjust the
unluckily positioned bones.

On the positive side, precision errors are definitely defeated, proven by tests.

P.S. Here is a test file with a badly positioned bone, which has different orientation with and without this patch. Playing with the tail X and Z in unpatched blender values can reveal the jump around 4.37m, and the bone noticeably wobbles near the boundary:

Added versioning for the incompatible fix.

Also, adjusted the condition for final switch to the trivial case to only use theta_alt.

Alexander Gavrilov (angavrilov) commandeered this revision.EditedNov 21 2020, 2:56 PM

So here are some things to discuss regarding my more involved and backwards incompatible fix for precision and discontinuity issues.

Although it is expected that the change in behavior would only affect the corner case of bones less than 0.2 degrees off but not exactly -Y, technically it is incompatible change that would make new files read incorrectly in old versions.

source/blender/blenkernel/intern/armature.c
2250

The thresholds now have two specific meanings:

  • SAFE_THRESHOLD is the difference of y from -1 at which the general case is stable and precise enough to use.
  • CRITICAL_THRESHOLD is the distance from origin on the X-Z plane below which the code switches to hard-coded 180 degrees rotation.

The first threshold is tuned based purely on minimizing the computation precision errors - the generic case breaks down around -Y, while the special version uses an approximation that explicitly assumes X and Z are small. The threshold basically should be set at the sweet spot where the expected error curves intersect.

The second threshold however deals with the inherent singularity of this transformation - like in Damped Track, the value of roll is 0/0 style undefined when nor is exactly -Y (and there is no limit either), so the code has to choose an arbitrary value and hard switch to it. The balance here is between a hard switch happening too soon, and roll becoming unstable and meaningless if the threshold is too low and numeric precision issues start to matter.

As shown by an empirical binary search test, the effective original threshold value before changing threshold logic was around 2.4e-4 to 3.4e-4, so 1e-5 is actually lowering it, purely to make one test newly added by Gaia Clary pass.

2271–2272

The reason for two checks here is that the condition should be true when theta == 2 && theta_alt == 0 aka nor = (0,1,0).

2283

As described in the comment above, this basically substitutes in the value of y expressed through x and z (assuming that the vector is precisely normalized), and uses the second order series expansion for sqrt to eliminate adding and then subtracting 1.

source/blender/blenkernel/intern/armature_test.cc
184

These tests actually check the CRITICAL_THRESHOLD value works as expected.

source/blender/blenloader/intern/versioning_290.c
626 ↗(On Diff #31269)

Use double the old SAFE_THRESHOLD just in case. Incompatibility only affects the old special cases around -Y.

Returned CRITICAL_THRESHOLD to the old effective value range pending discussion.

Rebased and resolved conflicts.

Sybren A. Stüvel (sybren) requested changes to this revision.Jul 29 2021, 1:06 PM

The way I see it, there are two distinct reasons unit tests fail:

  • They are based on incorrect values, and the error has been fixed. In this case, the test should be updated. When using test-driven development (TDD), the test would have been updated to the correct values first, and then the fix applied.
  • They are based on correct values, but test too strictly, such that changes in the code that cause slightly different rounding errors result in failures. Probably best to make the test slightly less strict if this is the case.
  • They are based on correct values, and the actual code is wrong. In this case, the tests should be kept, and the code adjusted.

Which one is which is hard to tell for me, and I'll trust the judgement of @Alexander Gavrilov (angavrilov) and @Gaia Clary (gaiaclary) here.

The code (including the test) could use another pass to mark every variable that's not modified as const.

For the versioning code, how about the animation data? Shouldn't that be updated too?

source/blender/blenkernel/intern/armature_test.cc
34

I think it would be fine to have this function more generally available, so in testing.h.

158

s is not a good name.

158

Why shouldn't values be nice?

173–176

Code style.

196–197

Do these really have to be static?

This revision now requires changes to proceed.Jul 29 2021, 1:06 PM

Updated to follow review comments in tests.

Alexander Gavrilov (angavrilov) retitled this revision from fix T82455: vec_roll_to_mat3_normalized() returns NaN for bones with longitudinal axis close to -Y to Fix T82455: precision issues and bug in vec_roll_to_mat3_normalized with axis close to -Y.Jul 29 2021, 9:07 PM
Alexander Gavrilov (angavrilov) edited the summary of this revision. (Show Details)
Alexander Gavrilov (angavrilov) marked 4 inline comments as done.EditedJul 29 2021, 9:18 PM

For the versioning code, how about the animation data? Shouldn't that be updated too?

Vector (aka head + tail) plus roll representation is only used for bone rest orientation storage (i.e. edit mode related). The idea is that this change may flip the required 'roll' angle value 180 degrees in a small set of cases that were affected by the bug, but the actual bone orientation should remain unchanged. For some rigs the rest pose may slightly change though, because the old math could produce noticeably non-orthonormal rest matrices (but still only around 1% spurious scale iirc), while after the fix it should be a couple of digits more precise.

source/blender/blenkernel/intern/armature_test.cc
34

It can't go there because it depends on BLI matrix math functions. I moved some code into a couple of new (double-precision) functions there though.

158

I don't remember exactly due to a lot of time passing, but added the likely reason in the comment.

196–197

Was just a silly optimization (static can be potentially put into constant memory instead of code).

Alexander Gavrilov (angavrilov) marked 2 inline comments as done.

Rebased on master.

Demeter Dzadik (Mets) accepted this revision.EditedOct 18 2021, 12:03 PM

Doktor, the test came out negative. Or positive. Whichever is "good" here.

P.S. Here is a test file with a badly positioned bone, which has different orientation with and without this patch. Playing with the tail X and Z in unpatched blender values can reveal the jump around 4.37m, and the bone noticeably wobbles near the boundary:

Video representation (master first, then patch): F11242772

Comparison of a Sprite Fright shot with all 5 human characters before/after the patch: F11242803 (pixel perfect match)

LGTM!

This revision is now accepted and ready to land.Oct 19 2021, 11:02 AM