More specifically, 16x the max number of threads on all multiprocessors,
with 1048576 minimum.
What this effectively does is double the state size on the very high end
GPUs like RTX A6000 and RTX 3080 while leaving the size unchanged for
others. On the RTX A6000 I there are 2-10% render time reductions on our
benchmark scenes. The biggest reduction is on the barbershop interior, as
scenes with more objects and shaders are more likely to benefit from
improved coherence.
This also adds an environment variable for developers to test different
sizes, and debug logging about the size and memory usage.

