This patch will significantly speed up attribute node when multiple
threads are available, especially in linear situations when parallelism
cannot be achieved elsewhere.
I tested on a mesh with 4 million vertices on a Ryzen 3700x.
All tests were done on "plain" attributes where no conversion
is necessary and the data is stored in contiguous arrays.
The results are an average over ~30 runs.
| Node | Before (ms) | After (ms) | Speedup (x times faster) | Notes |
| ---- | ----------- | ---------- | ------------------------ | ----- |
| Align Rotation to Vector | 651 | 70.8 | **9.19** | "Auto" pivot mode |
| Attribute Math | 60.1 | 10.1 | **5.95** | "Hyperbolic Sine" operation |
| Attribute Color Ramp | 55.2 | 11.1 | **4.97** | |
| Attribute Mix | 36.2 | 7.74 | **4.67** | Color data type |
| Attribute Sample Texture | 430 | 92 | **4.66** | "Clouds" texture |
| Attribute Math | 10.6 | 2.60 | **4.12** | "Add" operation |
| Attribute Randomize | 38.7 | 12.0 | **3.21** | Vector data type |
| Attribute Vector Math | 58.8 | 18.5 | **3.18** | Refract operation |
| Attribute Map Range | 28.2 | 28.7 | **0.99** | I will remove these changes / investigate further |
The changes are not exhaustive, some nodes other nodes could still
be parallelized in the future. Also, it would be possible to further
optimize the grain size in `parallel_for`. I'd rather make sure that
it isn't too small though. I tested some different values, but also
relied on intuition-- increasing grain size for less complex operations
and vice versa.