This is done based on the render sample count so that it doesn't impact sampling quality. It's similar in spirit to the adaptive table size in D16561, but in this case for performance rather than memory usage.
Details
Diff Detail
- Repository
- rB Blender
- Branch
- limited_sobol_burley (branched from master)
- Build Status
Buildable 24979 Build 24979: arc lint + arc unit
Event Timeline
In a simple test scene (only lambert shaders and very low poly count, so sample generation should be more of the run time) on an x86-64 CPU, this made a 1024-sample render go from a 1:15.69 render time to 1:13.65 (roughly %2.7 improvement). For comparison, rendering the same scene with the PMJ sampler took 1:11.29.
The performance improvement is somewhat dependent on sample count (lower sample counts have a greater relative perf improvement), but it scales logarithmically, so going to e.g. 4096 samples should only have slightly less relative improvement compared to 1024, and 256 only slightly more.
I haven't tested on GPU, so I don't know what the perf numbers look like there.
Btw, I'm happy for this to land either before or after D16443. If there are any conflicts, they should be very minor and easy to rebase either way.
I'll commit this with the optimization to compute reverse_integer_bits(aa_samples_next_ge_power_of_two - 1) in advance.