Page MenuHome

BLI: Refactor matrix types & functions to use templates
ClosedPublic

Authored by Clément Foucault (fclem) on Nov 26 2022, 6:47 PM.

Details

Summary

This patch implements the matrix types (i.e:float4x4) by making heavy
usage of templating. All matrix functions are now outside of the vector
classes (inside the blender::math namespace) and are not vector size
dependent for the most part.

Motivations

The goal/motivations of this rewrite are the same as the Vector C++ API (D13791):

  • Template everything for making it work with any types and avoid code duplication.
  • Use functional style instead of Object Oriented function call to allow a simple compatibility layer with GLSL syntax (see T103026 for more details).
  • Allow most convenient constructor syntax and accessors (array subscript matrix[c][r], or component alias matrix.y.z).
  • Make it cover all features the current C API supports for adoption.
  • Keep compilation time and debug performance somehow acceptable.

Consideration:

  • The new MatView class can be generated by my_float.view<NumCol, NumRow, StartCol, StartRow>() (with the last 2 being optionnal). This one allows modifying parts of the source matrix in place. It isn't pretty and duplicates a lot of code, but it is needed mainly to replace normalize_m4. At least I think it is a good starting point that can refined further.
  • An exhaustive list of missing BLI_math_matrix.h functions from the new API can be found here P3373.
  • This adds new Rotation types in order to have a clean API. This will be extended when we port the full Rotation API. The types are made so that they don't allow implicit down-casting to their vector representation.
  • Some functions make direct use of the Eigen library, bypassing the Eigen C API defined in intern/eigen. Its use is contained inside math_matrix.cc. There is conflicting opinion wether we should use it more so I contained its usage to almost the tasks as in the C API for now.

Performance, SSE & Eigen:

I implemented the SSE version of float3x3 multiplication just for the sake of it. I then tried to use Eigen to see if it would be a valid contender.

ImplementationOperationDebug BuildRelease Build
Baselinefloat_3x3_mul0.532984 sec0.017547 sec
Eigenfloat_3x3_mul4.121274 sec (~7.7x)0.023659 sec (~1.3x)
SSE2float_3x3_mul0.168365 sec (~0.3x)0.016753 sec (~1.0x)
ImplementationOperationDebug BuildRelease Build
Baselinefloat_4x4_mul1.057817 sec0.036605 sec
Eigenfloat_4x4_mul1.041448 sec (~1.0x)0.010853 sec (~0.3x)
SSE2float_4x4_mul0.204016 sec (~0.2x)0.010507 sec (~0.3x)

The debug build performance is clearly suboptimal with Eigen if not equal to the baseline. But it isn't even as fast as naive implementation in release build. The SSE code make it faster while not improving the release build perf.
Note that this needs more testing as this was only tested on M1 mac with Neon intrinsic which could have less problem with unaligned loads.

The code used for perf testing is P3348.

Diff Detail

Repository
rB Blender

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
source/blender/blenlib/BLI_math_rotation_types.hh
121

I don't really like the fact that we store the cos and sin here as well. It can help sometimes and be worse in others (when it is computed even though it's not needed). I think this should be passed to functions separately or there should be something like struct AxisAngleWithSinCos that contains everything.

source/blender/blenlib/intern/math_matrix.cc
18

I think these using statements should be removed. At least to me they make reading the code below harder, because Map, Matrix and Stride are very generic. With the Eigen:: prefix I would have much more context for what is meant.

This revision now requires changes to proceed.Dec 15 2022, 5:27 PM
Clément Foucault (fclem) marked 8 inline comments as done.
  • Add missing explicit template instanciations
  • Replace for loops with unroll template where it make sense
  • Implement correct non squared matrix multiplication
  • Address review comments
  • Merge branch 'master' into bli-matrix-template
source/blender/blenlib/BLI_math_matrix_types.hh
484

Your example does not compile. (MacOS + Clang 13.1.6)

You should get something like

/Users/clement/Blender/blender/source/blender/blenlib/tests/BLI_math_matrix_types_test.cc:298:32: error: no viable overloaded '+='
  mat_const.view<2, 2, 1, 1>() += 5;
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~
[...]
/Users/clement/Blender/blender/source/blender/blenlib/BLI_math_matrix_types.hh:566:12: note: candidate function not viable: 'this' argument has type 'const MatView<float, 2, 2, 4, 4, 1, 1>', but method is not marked const
  MatView &operator+=(T b)

This is because the view() method return either a const MatView or a MatView depending on the object const type.

So I am a bit hesitant to make the code way less understandable for something that should work.

source/blender/blenlib/BLI_math_rotation_types.hh
121

I am not against changing it but I would like to express my opinion.

  • I could not find a single use through the whole old rotation API that uses the angle as is without converting them to cos/sin representation.
  • This representation is also not suited for interpolation so it is likely to be a corner case than the main usage of it.
  • I doubt this is also a type to consider for storage. But a case could be made that that's always more semantically correct to store them as such. Also it might be better for data exchange with the GPU to use 16 bytes struct.
  • Having more types make the API less trivial to understand and use.

These being protected members, makes it implementation detail and allows for changes to be made later if the case arise that we really need these.
But to be fair, the rotation API wasn't the goal of this patch and is going to be addressed in a separate differential.

source/blender/blenlib/BLI_math_matrix_types.hh
484

There is an important difference between:

mat_const.view<2, 2, 1, 1>() += 5;

and

auto view = x.view<4, 4>();
view += 5;

The example you provided indeed does not work, but my example does and it has the const correctness problem.
The problem is that a const MatView can just be copied (it's just a pointer after all) and then it is not const anymore. This is what happens in my example. x.view() returns a const MatView, but view is just a MatView.

486

Does not look like this has been addressed.

  • Fix missing assert
  • Add MutableMatView for more const correctness

Generally seems ok now. I mean I don't necessarily agree with every decision, but this is something we can work with and is significantly better than what we had before :)

source/blender/blenlib/BLI_math_matrix.hh
373

Call this something like is_equal_approximate or so. It's never clear to me what a compare function returns.

source/blender/blenlib/BLI_math_matrix_types.hh
330

Think it's better to define the binary operator functions as a friend function, or it doesn't even have to be inside the class.

493

These checks are not necessary since they are included in the checks below.

541

Assert that the other matrix has the same size as this matrix.
Also this should probably be implemented outside of the class. Then only one num-row and num-col template argument should be neccesary (but two start-col/row arguments).

source/blender/blenlib/intern/math_matrix.cc
180

Eigen only by default, not in general

Clément Foucault (fclem) marked 7 inline comments as done.
  • Address review comments
source/blender/blenlib/BLI_math_matrix_types.hh
330

This is in order to provide template specialization for common types. I couldn't find a nice way for friend functions to work with this.

541

This is already the case here. You were just confused by the long list of arguments.

This is looking like a great improvement! Almost ready to accept this, just want to see the response to a couple of these comments. Most are trivial.

source/blender/blenlib/BLI_math_matrix.hh
29
35–41

Grammar

71–72

Wouldn't it make more sense to be specific about the rotation type here?
Or did you do it this way to support more rotation types in the future?

160

What do you think about adding a tidbit about when you would use pseudo_invert to the comment? Maybe others know about it, but it would be helpful to me anyway.

214–221

What do you think about the from_normalized_axes name? I've always wished I used that name since it's shorter and _data at the end is pretty useless

275
326–347

What about adding the from_ prefix to these, since they build matrices like the other functions above? Just a thought, totally fine if you don't think it's a good idea

388

Grammar. Appears elsewhere in this file too

source/blender/blenlib/BLI_math_matrix_types.hh
11

I understand you might not have the energy for this at this point, but I think it would be great to see a description of the overall design in a comment at the top of the file like BLI_vector.hh.
This file will be looked at a lot, and that would probably save people lots of time in the long run

11

Could the "this only works for tightly packed T" be checked with a static assert instead?

733–738

Grammar

did we ever look into why clang was ~3x faster than both msvc and gcc in the perf tests? is the performance disparity still there?

Clément Foucault (fclem) marked 10 inline comments as done.
  • Address review comments
  • Merge branch 'master' into bli-matrix-template
source/blender/blenlib/BLI_math_matrix.hh
71–72

The idea was to keep the declarations generic enough so that specializations is implementation details.

For the moment only the AxisAngle version is implemented.

160

I have no idea... maybe a good one for @Sergey Sharybin (sergey) .

214–221

I am tempted to rename it to from_orthonormal_axes. If the given axes aren't orthonormal, the resulting matrix isn't a proper rotation matrix.

Also tempted to template it for any axis couple.

326–347

I don't think it is a good idea because it isn't another representation of a transform like EulerXYZ or Location. It would make sense if it was from_perspective_distances or something like that, but it is way longer and doesn't convey much more information.

It would be good if @Ray Molenkamp (LazyDodo)'s performance question could be investigated before this is committed, but other than that it looks good to me. Nice job :)

source/blender/blenlib/BLI_math_matrix.hh
214–221

That name sounds fine to me too. I'm not sure about the template thing, but it also sounds reasonable.

tl;dr: Compilers are dumb.

So I investigated the performance issue. And it seems MSVC does not inline the function like the other compilers so the tests are quite biased. Should we try to force noinline in this test for a better comparison?
The C++ template is not as bad as Eigen and within the same ballpark as C BLI (except for MSVC), and since we are only using Eigen for what we already do in BLI (matrix inversion, SVD), it looks like a safe bet. The only new place where we use it is determinant calculation.

However, given the BlenLibTemplate score under MSVC, I am tempted to add a manually unrolled version for the float3x3 case to remove any possible regression with this particular type.

This was tested through Compiler Explorer online using P3409 (https://godbolt.org/z/dEjz147P6). So it is only meaningful to compare per compiler results as execution time between runs is quite variable (I would say +-20% variation).

x64 MSVC 19.30 /std:c++17 /O2
BlenLibTemplate : 1.839296899999999901 seconds.
BlenLibTemplate : 0.760307500000000047 seconds. (Everything unrolled manually, without += )
BlenLib         : 1.094654999999999934 seconds.
Eigen           : 5.275484099999999899 seconds.

x64 Clang 14.00 -03
BlenLibTemplate : 0.331119659000000011 seconds.
BlenLib         : 0.777056591000000019 seconds. (Highly variable, can go down to 0.28 sec)
Eigen           : 1.978155263000000108 seconds.

x64 GCC 12.2 -03
BlenLibTemplate : 1.108450895000000047 seconds.
BlenLib         : 1.253889406000000095 seconds.
Eigen           : 1.835087734000000026 seconds.

For MSVC, Eigen is 5x slower and new template matrix is 1.5x slower. Manual unrolling seems to remove this slowdown.

  • Remove static assert as it doesn't compile
  • Fix wrap some std function to math namespace
  • Merge branch 'master' into bli-matrix-template

Overall seems like a good step forward, I didn't see anything concerning on some quick checks, the other guys did a more thorough review already it seems.

Brecht Van Lommel (brecht) requested changes to this revision.Jan 4 2023, 12:44 PM

I think the performance issue has to be clarified before this is ready, unrolling if necessary.

Also note that we build Blender with -O2and that's how we should be measuring performance, not -O3.

Since @Sergey Sharybin (sergey) is back I'll leave the naming and organization review to him, I don't have strong opinions on that.

This revision now requires changes to proceed.Jan 4 2023, 12:44 PM

The naming is probably ok. Thing is: the already existing C++ code in master diverged from the naming used in C and Python. This patch follows the existing C++ code, which is fine. If we are to cover the naming consistency topic it should be done with the existing C++ code taken into account. Although, with [[nodiscard]] and const-correctness not sure how important that is.

Not sure what you mean by organization review though.

  • Avoid aligning float3x3, only align if comp size is multiple of 4
  • Unroll float3x3 operator* and add reference SSE implementation

I have added the float3x3 SSE implementation for consistency / reference if someone wants to take a closer look. This version is well behaved (doesn't write out of bounds) and is 50 instructions long (vs 80 for the original implementation).
I benchmarked it P3422 with https://quick-bench.com/ (with -O2) but in all cases, it proved to be consistently slower than the naive implementation.

I have no idea what is causing these differences as the BlenLib and BlendLibTemplate here uses exactly the same code. My guess is that it has to do with the resulting matrix copy.

So I am unsure which code is really faster in real life scenario.

The benchmark is a bit biased against the BlenLib implementation. mul_m3_m3m3 has some extra branching in that none of the other implementations have, as they do not account for the side cases where R==A or R==B so it's a bit of an unfair competition, if you call mul_m3_m3m3_uniq rather than mul_m3_m3m3 blenlibs perf seems to fall in line with the others, with the SSE implementation coming in last

Looking at the disassembly, the templated version got inlined, while mul_m3_m3m3_uniq still made a function call hence a branch the other benchmark is not paying for, taking the BLI_NOINLINE off mul_m3_m3m3_uniq yields the following results

@Ray Molenkamp (LazyDodo) Your last graph is misleading. Maybe you forgot to set optimization level to O2. All versions are not inlined as they are all tagged with BLI_NO_INLINE.
See:
call 2138f0 <(anonymous namespace)::MatBase<float, 3, 3>::operator*((anonymous namespace)::MatBase<float, 3, 3> const&) const>
call 213a10 <(anonymous namespace)::MatBase<float, 3, 3>::operator<<((anonymous namespace)::MatBase<float, 3, 3> const&) const>

That said, I agree that a direct comparison against mul_m3_m3m3_uniq is fairer and the number in your first graph are correct.
Making everything inline reveals that the unaligned load / store might be the cause of the slowness of the SSE version. But aligning everything is out of question for now.

Yeah i have no idea what happened there, I closed to page already so i can't validate if i did leave it on -O3 or not, however can't replicate the results of that second graph so best to ignore that result

source/blender/blenlib/BLI_math_matrix.hh
391

Do we need to specify this is the MatBase<T,3,3> specialization?
(Same for orthonormal)

source/blender/blenlib/BLI_math_matrix_types.hh
79

My c++-foo is weak, do we also need ?:

  MatBase(MatBase &&) = default;
  MatBase& operator=(const MatBase &) = default;
`
source/blender/blenlib/BLI_math_matrix_types.hh
467

Can we make a slightly stronger hash by following FNV-hash and using XOR instead of ADD:

uint64_t hash() const
{
  uint64_t h = 14695981039346656037ULL;
  unroll<NumCol * NumRow>([&](auto i) {
    T value = (static_cast<const T *>(this))[i];
    h *= 1099511628211ULL;
    h ^= *reinterpret_cast<const as_uint_type<T> *>(&value);
  });
  return h;
}
source/blender/blenlib/BLI_math_matrix_types.hh
474

(!OPTIONAL!)

As a personal preference, it's possible to write this slightly cleaner like so:

  friend std::ostream &operator<<(std::ostream &stream, const MatBase &mat)
  {
    const char *col_prefix = "(\n";
    unroll<NumCol>([&](auto i) {
      stream << col_prefix;
      col_prefix = "),\n(";
      const char *row_prefix = "(";
      unroll<NumRow>([&](auto j) {
        /** NOTE: j and i are swapped to follow mathematical convention. */
        stream << row_prefix << mat[j][i];
        row_prefix = ", ";
      });
    });
    stream << ")\n)\n";
    return stream;
  }
};
source/blender/blenlib/BLI_math_matrix.hh
1038

(question?)
Can we remove the negation by using the identity:

-math::cross(forward, up) === math::cross(up, forward)
source/blender/blenlib/BLI_math_matrix_types.hh
355

TODO: The return type doesn't look quite right here, it should be possible to return a non-square matrix.
e.g. a MatBase<2,3> * MatBase<3,4> should return a MatBase<2,4> ?
(Or it's possible I have that transposed.)

441

To support non-square matrices, perhaps std::min(NumCol,NumRow)

This revision is now accepted and ready to land.Jan 5 2023, 3:22 PM
Clément Foucault (fclem) marked 9 inline comments as done.
  • Add c-style matrices constructor for view. Allowing easy wrapping of DNA matrices
  • Remove unecessary negate after cross product.
  • Address review comments
source/blender/blenlib/BLI_math_matrix.hh
391

This is unfortunately not possible to determine if a matrix is 2d affine (projective) or 3d rotation.

For this reason, I would leave this function as is for now. It could become templated with the dimensionality.
However, as it isn't present inside BLI C matrix API, I don't see the incentive to add it now.

source/blender/blenlib/BLI_math_matrix_types.hh
79

That isn't required since none of the copy / move constructor/assignment are defined. Thus these are automatically defined.

355

I did it out of simplicity and avoiding to template this even more. I'll leave that as a TODO.

441

Good catch!

467

This patch uses the same hash as the old float4x4 library for compatibility. If we have to change it, it isn't in this patch.

474

I find that's a little less obvious.

  • Merge branch 'master' into bli-matrix-template
  • Make Sergey happier.