Page MenuHome

Cycles: Improve OptiX viewport denoising performance with CUDA rendering
ClosedPublic

Authored by Patrick Mours (pmoursnv) on Jun 9 2020, 7:56 PM.
Tags
None
Subscribers
None
Tokens
"Party Time" token, awarded by lichtwerk."Love" token, awarded by amonpaike."Love" token, awarded by Alaska."Like" token, awarded by YAFU."Love" token, awarded by ghfujianbin.

Details

Summary

Now that OptiX denoising can be used on non-RTX GPUs, it will become more common for users to render with CUDA, but use OptiX denoising. Previously this was really slow in the viewport, especially with multiple GPUs, since the whole buffer was copied around multiple times before each denoising step (from CUDA devices to host, then from host to OptiX device, then from OptiX device back to host and finally back to the CUDA devices).

This patch addresses that by recognizing when a logical OptiX and CUDA device represent the same physical GPU and attempting to eliminate those copies if that is the case for all active devices (similar to what is happening when OptiX is used for both rendering and denoising). In addition, denoising is now no longer performed on the first available OptiX device only, but instead it will try to match CUDA and OptiX rendering/denoising devices exactly if possible (to maximize utilization).

This also fixes T75289 and T77593 (with the changes to session.cpp) and a race condition when denoising with multiple GPUs (since map_neighbor_tiles is not thread-safe).

Diff Detail

Repository
rB Blender
Branch
cycles_fix_cuda_denoising_2 (branched from master)
Build Status
Buildable 8472
Build 8472: arc lint + arc unit