Trying to render the Blender 2.92 splash screen (https://cloud.blender.org/p/gallery/60337d495677e942564cce76) with OptiX on a GPU with limited VRAM fails with an OPTIX_ERROR_INVALID_VALUE in optixAccelBuild... error. That's ... rather cryptic.
What's actually going on is that Cycles is trying to build an OptiX acceleration structure in host memory (allocated with cuMemHostAlloc), which is not allowed (it has to be in device memory, from cuMemAlloc), hence the error. This patch addresses that by amending the MEM_DEVICE_ONLY type to actually only allocate on the device and fail if that is not possible anymore because out-of-memory. In that case Cycle will now return with a "System is out of GPU memory" error message, which is much more easier to understand for end users.
This should not be a problem, since MEM_DEVICE_ONLY was seldomely used before anyway, so I just changed some of those instances to other memory types in case they do not have this restriction.
There is another problem here that I haven't addressed yet though:
The aforementioned scene shouldn't need as much memory as it does. I get a 16GB peak requirement, which requires a rather beefy GPU to meet. But during actual rendering, after everything was built and compacted, it actually sits around only 8GB.
The problem here is that Cycles is building all the bottom level acceleration structures for OptiX in parallel (geometry.cpp line 1933). For each acceleration structure it has to allocate some temporary memory on the GPU (for vertices, etc.), which, since this is running in parallel, accumulates to a huge amount of memory and is where the peak is coming from. If instead force all bottom level builds to run serialized, the problem goes away and I see a peak of only 9GB, which more consumer GPUs can handle.
I'm not sure how to expose this to users though. Ideally Cycles could automatically decide whether it makes more sense to run the builds in parallel or not, but that's probably difficult to predict. So maybe just add an option to choose? Or always run serialized for OptiX (this problem happens with CUDA too though, only that instead of VRAM getting exhausted, it's system RAM during the BVH2 build there)?
Just noticed that this exact issue also has been reported here: T85985