This change makes it possible to render single frame on multiple GPUs
and/or GPU(s)+CPU. (as configured in the User Preferences).
Work is split equally along the height of the big tile.
In the future this will be looked into to perform better initial guess
based on devices performance, dynamic re-scheduling, and interleaving
scanlines across devices.
The main idea is to move render buffers to per-work basis, so that the
ender buffers are always associated with the device work is being done
by. And then upon access delegate the read/write to the work, so that
it operates with a specific slice in the source/destination,
There are some multiple memory and performance improvements possible,
like:
- Copy render result to GPUDisplay from multiple threads (now when it is clear graphics inetrop can not be mixed in with naive update).
- Avoid denoiser buffer re-allocation.
- Avoid creation of temporary buffers in the denoisers when we know that we have a copy of real buffers.
- Only copy passes needed for denoiser, and results of denoiser.
The current state of the PathTrace::denoise() is not entirely ideal:
it could be split up, and memory usage could be improved. But think it
is good enough for the initial implementation. The further improvements
would require changes in the Denoiser API.