This patch implements the matrix types (i.e:float4x4) by making heavy
usage of templating. All matrix functions are now outside of the vector
classes (inside the blender::math namespace) and are not vector size
dependent for the most part.
Motivations
The goal/motivations of this rewrite are the same as the Vector C++ API (D13791):
- Template everything for making it work with any types and avoid code duplication.
- Use functional style instead of Object Oriented function call to allow a simple compatibility layer with GLSL syntax (see T103026 for more details).
- Allow most convenient constructor syntax and accessors (array subscript matrix[c][r], or component alias matrix.y.z).
- Make it cover all features the current C API supports for adoption.
- Keep compilation time and debug performance somehow acceptable.
Consideration:
- The new MatView class can be generated by my_float.view<NumCol, NumRow, StartCol, StartRow>() (with the last 2 being optionnal). This one allows modifying parts of the source matrix in place. It isn't pretty and duplicates a lot of code, but it is needed mainly to replace normalize_m4. At least I think it is a good starting point that can refined further.
- An exhaustive list of missing BLI_math_matrix.h functions from the new API can be found here P3373.
- This adds new Rotation types in order to have a clean API. This will be extended when we port the full Rotation API. The types are made so that they don't allow implicit down-casting to their vector representation.
- Some functions make direct use of the Eigen library, bypassing the Eigen C API defined in intern/eigen. Its use is contained inside math_matrix.cc. There is conflicting opinion wether we should use it more so I contained its usage to almost the tasks as in the C API for now.
Performance, SSE & Eigen:
I implemented the SSE version of float3x3 multiplication just for the sake of it. I then tried to use Eigen to see if it would be a valid contender.
| Implementation | Operation | Debug Build | Release Build |
| Baseline | float_3x3_mul | 0.532984 sec | 0.017547 sec |
| Eigen | float_3x3_mul | 4.121274 sec (~7.7x) | 0.023659 sec (~1.3x) |
| SSE2 | float_3x3_mul | 0.168365 sec (~0.3x) | 0.016753 sec (~1.0x) |
| Implementation | Operation | Debug Build | Release Build |
| Baseline | float_4x4_mul | 1.057817 sec | 0.036605 sec |
| Eigen | float_4x4_mul | 1.041448 sec (~1.0x) | 0.010853 sec (~0.3x) |
| SSE2 | float_4x4_mul | 0.204016 sec (~0.2x) | 0.010507 sec (~0.3x) |
The debug build performance is clearly suboptimal with Eigen if not equal to the baseline. But it isn't even as fast as naive implementation in release build. The SSE code make it faster while not improving the release build perf.
Note that this needs more testing as this was only tested on M1 mac with Neon intrinsic which could have less problem with unaligned loads.
The code used for perf testing is P3348.


