- Perform closure merging earlier, as part of shader evaluation
- Don't store unnecessary closures for emission/shadow shader evaluation
- Reduce number of closures for emission/shadow evaluation to 2
- Use same ShaderData for volumes and surfaces
Before:
ptxas info : Function properties for kernel_cuda_path_trace
49232 bytes stack frame, 1992 bytes spill stores, 3588 bytes spill loads
ptxas info : Used 40 registers, 364 bytes cmem[0], 1100 bytes cmem[2]
ptxas info : Function properties for kernel_cuda_branched_path_trace
68144 bytes stack frame, 1344 bytes spill stores, 3620 bytes spill loads
ptxas info : Used 64 registers, 364 bytes cmem[0], 1180 bytes cmem[2]After:
ptxas info : Function properties for kernel_cuda_path_trace
19104 bytes stack frame, 1908 bytes spill stores, 3628 bytes spill loads
ptxas info : Used 40 registers, 364 bytes cmem[0], 1116 bytes cmem[2]
ptxas info : Function properties for kernel_cuda_branched_path_trace
28992 bytes stack frame, 1360 bytes spill stores, 3612 bytes spill loads
ptxas info : Used 64 registers, 364 bytes cmem[0], 1196 bytes cmem[2]So that's roughly a 60% reduction in stack memory usage.
To see how much overhead is left from closures, if we set max closures to 1 we would get:
ptxas info : Function properties for kernel_cuda_path_trace
15008 bytes stack frame, 1920 bytes spill stores, 3624 bytes spill loads
ptxas info : Used 40 registers, 364 bytes cmem[0], 1116 bytes cmem[2]
ptxas info : Function properties for kernel_cuda_branched_path_trace
17232 bytes stack frame, 1368 bytes spill stores, 3548 bytes spill loads
ptxas info : Used 64 registers, 364 bytes cmem[0], 1196 bytes cmem[2]Performance and correctness still need to be tested more, but in principle the reduced storage requirements should not affect any existing scenes besides perhaps rendering slight slower. Now that the merging happens earlier, we could consider lowering the max number of closures further, assuming that existing scenes mostly run out of closures due to duplicate BSDFs. But there could also be scenes that actually use 64 different closures.
