Page MenuHome

Cycles X broke OptiX memory pooling via NVLink
Closed, ResolvedPublic

Description

System Information
Operating system: Windows-10-10.0.19042-SP0 64 Bits
Graphics card: NVIDIA GeForce RTX 2080 Ti/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 472.47

Blender Version
Broken: version: 3.0.0, branch: master, commit date: 2021-12-02 18:35, hash: rBf1cca3055776
Worked: (newest version of Blender that worked as expected)

Short description of error
Distribute memory across devies (Optix ) is in Blender 3.0 broken

Exact steps for others to reproduce the error

  • Open the attached blend file in Blender 3.0.
  • Ensure that you have 2x RTX 2080TI with NVLink and enable "Distribute memory across devices" in OptiX

  • Click on render
  • Render will crash with this error:
Failed to build OptiX acceleration structure

OPTIX_ERROR_INVALID_VALUE in optixAccelBuild(context, 0, &options, &build_input, 1, temp_mem.device_pointer, sizes.tempSizeInBytes, out_data.device_pointer, sizes.outputSizeInBytes, &out_handle, use_fast_trace_bvh ? &compacted_size_prop : 0, use_fast_trace_bvh ? 1 : 0) (C:\Users\blender\git\blender-v300\blender.git\intern\cycles\device\optix\device_impl.cpp:1066)OPTIX_ERROR_INVALID_VALUE in optixAccelBuild(context, 0, &options, &build_input, 1, temp_mem.device_pointer, sizes.tempSizeInBytes, out_data.device_pointer, sizes.outputSizeInBytes, &out_handle, use_fast_trace_bvh ? &compacted_size_prop : 0, use_fast_trace_bvh ? 1 : 0) (C:\Users\blender\git\blender-v300\blender.git\intern\cycles\device\optix\device_impl.cpp:1066)

Additional informations

  • "Distribute memory across devices" is working in CUDA mode (Blender 3.0)
  • "Distribute memory across devices" in OptiX is working in Blender 2.93 - you can open the attached file in 2.93 and will see it's rendering.
  • The attached file requires 17GB VRAM in OptiX in Blender 2.93. If you have 2x RTX 3090 this bug can be maybe also reproduced by increasing the required VRAM over 24GB, just duplicate some Suzsannes. I was only able to test it on 2x RTX 2080ti because my lack of 3090s ;)

Thank you for your help

Event Timeline

@Patrick Mours (pmoursnv)
Maybe you can take a look on this issue?

@Patrick Mours (pmoursnv)

Hey,

thank you for the quick PR for this issue. But does your PR really fix this issue?
The problem is not getting out of memory, the problem is that the shared memory of two cards via NVLink is not working in OptiX. Memory size should be fine, if memory pooling would work like in 2.93.

The OPTIX_ERROR_INVALID_VALUE you are seeing is happening because of an out of memory (you can verify this is the log, with "--debug-cycles"). Likely this is the case because BVH builds are happening in parallel, which quickly exhausts available memory because of temporary build memory required (and has some additional known quirks when memory pooling is active), rather than serialized which does not suffer from that problem. That was fixed before (and is in 2.93), but the fix got lost in the Cycles X merge (and thus is not in 3.0), hence why 3.0 behaves differently. The rest of the pooled memory implementation has not changed.

Jesse Yurkovich (deadpin) changed the task status from Needs Triage to Needs Information from User.Dec 24 2021, 3:26 AM

@Rincewind (Rincewind3D) Are you able to try out a 3.1 build to double-check that the issue is fixed?

@Jesse Yurkovich (deadpin)

Yes, it working fine in 3.1.

Tested in:

Blender 3.1.0 - Alpha
December 24, 02:24:18 - 35bd6fe993a1