I noticed that the results using the built-in tricubic sampler in NanoVDB were very different to the tricubic sampling that was employed with dense volumes. Plus it was not implemented in OpenCL at all.
This fixes that by using the same algorithm as before, but with NanoVDB now. It also fixes some remaining offset issues. Plus some minor things that broke OpenCL kernel compilation on NVIDIA.
I tested the smoke scene from the regression tests and the bunny sample OpenVDB volume across CPU, CUDA, OptiX and OpenCL, comparing the results from dense and NanoVDB rendering and they are now approximately the same (no longer distinguishable by eye, just some small pixel differences when diffing the data).