Went with an a bit simpler approach than originally proposed in T95836 for now: When there are multiple devices used for rendering (path_trace_works_.size() > 1), this patch causes the render buffers to be copied to the denoiser device once before denoising (into denoiser_buffer_) and output/display is then fed from that single buffer on the denoiser device. That way usually all but one copy (from all the render devices to the denoiser device) can be eliminated, provided that the denoiser device is also the display device (in which case interop is used to update the display).
As such this patch also adds some logic that tries to ensure the chosen denoiser device is the same as the display device. This is a bit tricky, since it requires the OpenGL context used for display to be current at the time of denoiser device selection, so had to move a few lines of code around to avoid a deadlock between the scene mutex and the Blender render context mutex.
This could be improved further by changing copy_to_render_buffers to support doing direct GPU->GPU copies where applicable, but even without, this already speeds things up by reducing the GPU->CPU->GPU roundtrips from 3 to just 1.
Details
Details
Diff Detail
Diff Detail
- Repository
- rB Blender
Event Timeline
Comment Actions
This looks fine, but can you rename denoise_buffer_ to multi_denoise_work_?
To make clear that it's not the render buffer itself.
Comment Actions
Sure, though maybe big_tile_denoise_work_ to indicate that this is the combination of all slices?
Do you think it makes sense to commit to 3.3 as well (considering that it is a LTS release and this is essentially a performance bug fix)?
Comment Actions
Fine with me.
Do you think it makes sense to commit to 3.3 as well (considering that it is a LTS release and this is essentially a performance bug fix)?
Yes, I think so.