I did some investigation into the performance of the mesh to points node.
There are fancier possibilities for improvements, like taking ownership
of existing arrays in some cases, but this patch takes a simpler brute
force approach for now.
The first change is to move from the previous loop to using the new
`materialize_compressed_to_uninitialized` method on virtual arrays,
which adds only the selected values to the output. That is a nice
improvement in some cases, corresponding to the "Without Threading"
column in the chart below.
The next change is to call that function in parallel on slices of
the output. To avoid generating too much code, we can avoid
templating based on the type and devirtualizing completely.
The test input is a 4 million point grid, generated by the grid primitive node.
Color and 2D vector attributes were also transferred to the points.
| Test | Before | After | Without Threading | Change |
| -------------------- | ------ | ------ | ----------------- | ------ |
| All Verts | 209 ms | 186 ms | 170 ms | 0.8x |
| 0.01 Selection | 148 ms | 143 ms | 133 ms | 0.9x |
| All Faces | 326 ms | 303 ms | 87 ms | 0.27x |
| 0.01 Selection Faces | 70 ms | 68 ms | 34 ms | 0.49x |
One curious result is that moving from faces ended up being faster
than from vertices. I don't understand why that is yet.