Page MenuHome

refactor vec_roll_to_mat3_normalized() for clarity
ClosedPublic

Authored by Gaia Clary (gaiaclary) on Nov 2 2020, 10:37 AM.

Details

Summary

the function vec_roll_to_mat3_normalized() has a bug as described in T82455. This Differential is only for refactoring the code such that it becomes more clear what the function does and how the bug can be fixed. This differential is supposed to not introduce any functional changes.

Diff Detail

Repository
rB Blender

Event Timeline

Gaia Clary (gaiaclary) requested review of this revision.Nov 2 2020, 10:37 AM
Gaia Clary (gaiaclary) retitled this revision from fix: the conversion of roll to matrix breaks when a bone has very small values in x and z. In that case we get a division by very small numbers which can cause a broken bone roll matrix. to The conversion of roll to matrix breaks in some cases.Nov 2 2020, 10:40 AM
Gaia Clary (gaiaclary) edited the summary of this revision. (Show Details)

@Gaia Clary (gaiaclary) do you feel up to adding some unit tests for this function? That way it's much easier to document certain edge cases, and to show that this change doesn't have a negative impact on existing situations. The test could include the examples from T23954, T31333, T27675, and T30438, to ensure that these situations actually remain fixed.

source/blender/blenkernel/intern/armature.c
2171–2172

Not for this patch, but this comment is a bit mind-boggling to me.

  1. Why does it matter what old code did when it's no longer around?
  2. Why is this comment saying it uses two different threshold values, when one of the two is actually the same as used before?
Gaia Clary (gaiaclary) edited the summary of this revision. (Show Details)Nov 2 2020, 12:38 PM

I have lots of questions about this particular bit of code, and I hope now that you've dived into it you can help answer them.

due to float precision errors, we can have nor = (0.0, 0.99999994, 0.0)...

Why is this bad? What is theta? There is no comment that explains it. It's first defined as 1+nor[1] (so in this example theta is almost 2.0), but later it's redefined as nor[0]*nor[0] + nor[2]*nor[2] (so in this example it is 0.0).

I have found other situations where nor == (x, 0.99999994, z) with x and/or z very close to 0, but not exactly 0. However the test (nor[0] || nor[2]) can be true for tiny numbers.

A zero check for floats is usually bad, so changing those to some epsilon-around-zero check would likely be better, indeed.

Because of this i believe it is better to not only check (x or z != 0) but also to make sure that (nor[0] * nor[0] + nor[2] * nor[2]) is above THETA_THRESHOLD_NEGY_CLOSE.

This would create a tight coupling between that check, and the redefinition of theta as nor[0]*nor[0] + nor[2]*nor[2]. I think having a separate variable, with a more describing name, would be better. That way if the new variable is too small, the approach that uses the old variable can still be used. This would also make it possible to declare both theta and the new var as const.

Remark: I was once able to create a three bone situation by hand which shows the described failure, but i could not reproduce the situation ever since then. I guess the problem is mathematical unprecision. However i see this issue very frequently when using a generated rig (with python) where the bones are combined with a rather complicated set of constraints. Depending on the generator parameters occasionally the rig implodes due to this issue.

This makes it even more important to write some automated tests. These will just use a vector and a float as input, instead of requiring an entire rig.

This seems to make sense yes, but indeed would be nice to have some regression tests on this now...

refactor vec_roll_to_mat3_normalized() fro easier reading.

in particular:

  • added 3 const float x,y,z to replace usage of nor[0], nor[1], nor[2]
  • replaced the #defines by static const float numbers and renamed the variables with more descriptive names.

Below is a python script that generates a rig that suffers from this bug. Run the script, then switch from edit mode to pose mode and see how "bad bone" disappears. Because its pose bone matrix is totally broken. The problem originates in the super small bone roll values which finally lead to the bug:

import bpy

bone_data = [
    ["good bone", (0.0, 0.0, 1.2038891315460205), (0.0, 0.01536799967288971, 1.3546786308288574), -4.1202467175263267e-16],
    ["bad bone", (0.0, 0.01536799967288971, 1.3546786308288574), (0.0, 0.0, 1.2038891315460205), -1.570796257510665e-07],
]

def make_armature(ctx, name):
    data = bpy.data.armatures.new(name)
    obj  = bpy.data.objects.new(name, data)
    ctx.collection.objects.link(obj)
    ctx.view_layer.objects.active = obj
    bpy.ops.object.mode_set(mode='EDIT')
    parent = None
    for bname, head, tail, roll in bone_data:
        bone = obj.data.edit_bones.new(bname)
        if parent:
            bone.parent = parent
        parent=bone
        bone.head=head
        bone.tail=tail
        bone.roll = roll
    return obj

armobj=make_armature(bpy.context, "armature")

I added a simple workaround to our Addon such that it just limits the precision of the bone roll to only 6 digits. This is enough to keep Blender from doing bad stuff. The correct fix though is to apply this differential to master :)

  • Refactor function for easier reading.
  • revert fix and only keep the refactored code with no funtional changes
Sybren A. Stüvel (sybren) requested changes to this revision.Nov 5 2020, 3:43 PM
Sybren A. Stüvel (sybren) added inline comments.
source/blender/blenkernel/intern/armature.c
2159–2160

Any particular reason this is static? The good thing about #define is that it's all handled by the precompiler.

2159–2160

Comments in C files should be /* */ style.

2159–2160

Comments should be a full sentence, so start with capital and end with period.

2160

I think the change in name to "safe" and "critical" is really nice.

2171–2172

These don't change value any more, so you can declare them as const:

const float theta = 1.0f + y;
const float theta_alt = x * x + z * z;
2172

Calculating theta_alt can be deferred to the else clause where it's actually used. Might save a few CPU ticks (although this might be optimized by the compiler as well).

2173

Instead of

(due to float precision errors, we can have nor = (0.0, -0.99999994, 0.0)...).

I think it's nicer to remove the "..." and explain why this is a problem:

Due to float precision errors, nor can be (0.0, -0.99999994, 0.0) which results in theta being almost zero.
This will cause problems when used as divisor.

2177–2178

Instead of -Y I think not (0, -1, 0) is clearer. Or "the -Y vector", whichever you prefer.

2202

"very close to" includes "at zero distance", so "nor is very close to -Y" is enough.

This revision now requires changes to proceed.Nov 5 2020, 3:43 PM
Gaia Clary (gaiaclary) marked 8 inline comments as done.
  • revert fix and only keep the refactored code with no funtional changes
  • Added proposed changes as described in Differential comments
source/blender/blenkernel/intern/armature.c
2159–2160

Any particular reason this is static? The good thing about #define is that it's all handled by the precompiler.

maybe it was a misunderstanding from a comment from sergey. I removed the 'static'. I still prefer to have it a regular variable.

2172

Calculating theta_alt can be deferred to the else clause where it's actually used. Might save a few CPU ticks (although this might be optimized by the compiler as well).

This is true for now, but when the bug fix is applied, then we need theta_alt already in the if clause.

Gaia Clary (gaiaclary) retitled this revision from The conversion of roll to matrix breaks in some cases to refactor vec_roll_to_mat3_normalized() for clarity.Nov 6 2020, 10:39 AM
Gaia Clary (gaiaclary) edited the summary of this revision. (Show Details)
Gaia Clary (gaiaclary) edited the summary of this revision. (Show Details)Nov 6 2020, 10:43 AM

made the comments for the THETA constants more clear.

This revision is now accepted and ready to land.Nov 13 2020, 9:32 AM
  • Refactor function for easier reading.
  • take care that no funtional changes are made during refactoring.