- Use threading::parallel_for for multi-threading, for a simpler API, more readable and concise code.
- Use Span and Array (only internally, the public API is still C) for safer, more automatic memory management.
- Since code is much less verbose, combine the callbacks into the main function. Note that the accumulation code could be more concise with float3, I just wanted to keep these changes minimal.
@Campbell Barton (campbellbarton), do you have any of the test files you used from D11993?
I'd check those to make sure this doesn't have a performance impact (I wouldn't expect it to).