Previously the storage here was optimized to avoid indirections in BVH2
traversal. This helps improve performance a bit, but makes performance
and memory usage of Embree and OptiX BVHs a bit worse also. It also adds
code complexity in other parts of the code.
Now decouple triangle and curve primitive storage from BVH2.
- Reduced peak memory usage on all devices
- Bit better performance for OptiX and Embree
- Bit worse performance for CUDA
- Simplified code:
- Intersection.prim/object now patches ShaderData.prim/object
- No more offset manipulation for mesh displacement before a BVH is built
- Remove primitive packing code and flags for Embree and OptiX
- Curve segments are now stored in a KernelCurve struct
