Changeset View
Standalone View
source/blender/draw/engines/compositor/compositor_engine.cc
| Show All 21 Lines | |||||
| #include "DRW_render.h" | #include "DRW_render.h" | ||||
| #include "IMB_colormanagement.h" | #include "IMB_colormanagement.h" | ||||
| #include "COM_context.hh" | #include "COM_context.hh" | ||||
| #include "COM_evaluator.hh" | #include "COM_evaluator.hh" | ||||
| #include "COM_texture_pool.hh" | #include "COM_texture_pool.hh" | ||||
| #include "GPU_context.h" | |||||
| #include "GPU_texture.h" | #include "GPU_texture.h" | ||||
| #include "compositor_engine.h" /* Own include. */ | #include "compositor_engine.h" /* Own include. */ | ||||
| namespace blender::draw::compositor { | namespace blender::draw::compositor { | ||||
| class TexturePool : public realtime_compositor::TexturePool { | class TexturePool : public realtime_compositor::TexturePool { | ||||
| public: | public: | ||||
| ▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines | static void compositor_engine_free(void *instance_data) | ||||
| delete engine; | delete engine; | ||||
| } | } | ||||
| static void compositor_engine_draw(void *data) | static void compositor_engine_draw(void *data) | ||||
| { | { | ||||
| COMPOSITOR_Data *compositor_data = static_cast<COMPOSITOR_Data *>(data); | COMPOSITOR_Data *compositor_data = static_cast<COMPOSITOR_Data *>(data); | ||||
| #if defined(__APPLE__) | #if defined(__APPLE__) | ||||
| blender::StringRef("Viewport compositor not supported on MacOS") | if (GPU_backend_get_type() == GPU_BACKEND_METAL) { | ||||
| /* NOTE(Metal): Isolate Compositor compute work in individual command buffer to improve | |||||
| * workload scheduling. When expensive compositor nodes are in the graph, these can stall out | |||||
| * the GPU for extended periods of time and suboptimally schedule work for execution. */ | |||||
| GPU_flush(); | |||||
| } | |||||
| else { | |||||
| /* Realtime Compositor is not supported on macOS with the OpenGL backend. */ | |||||
| blender::StringRef("Viewport compositor is only supported on MacOS with the Metal Backend.") | |||||
| .copy(compositor_data->info, GPU_INFO_SIZE); | .copy(compositor_data->info, GPU_INFO_SIZE); | ||||
| return; | return; | ||||
| } | |||||
| #endif | #endif | ||||
| /* Exceute Compositor render commands. */ | |||||
| compositor_data->instance_data->draw(); | compositor_data->instance_data->draw(); | ||||
| #if defined(__APPLE__) | |||||
| /* NOTE(Metal): Following previous flush to break commmand stream, with compositor command | |||||
| * buffers potentially being heavy, we avoid issuing subsequent commands until compositor work | |||||
| * has completed. If subsequent work is prematurely queued up, the subsequent command buffers | |||||
| * will be blocked behind compositor work and may trigger a command buffer time-out error. As a | |||||
| * result, we should wait for compositor work to complete. | |||||
| * | |||||
| * This is not an efficient approach for peak performance, but a catch-all to prevent command | |||||
| * buffer failure, until the offending cases can be resolved. */ | |||||
| if (GPU_backend_get_type() == GPU_BACKEND_METAL) { | |||||
| GPU_finish(); | |||||
fclem: Maybe a silly question but, can't we flush instead to also split the command buffer. Or does… | |||||
Not Done Inline ActionsCertainly not a silly question, flushing would be ideal and is what I had originally tried and hoped would resolve the problem. Theoretically, this issue should not exist in this form as compute kernels have a very very long time out. The caveat however comes from subsequent rendering work, which is dependent on the results of compute. The issue with how the dependency tracking works here is that certain parts of the render workload can begin, as they are not dependent on the compute work, but rendering of the results of the compositor is dependent. As such, it is the stalled graphics work which is timing out (as graphics has quite a short timeout to promote app responsiveness). The problem with flushing the command buffer is that it still allows subsequent work to become queued up behind, and when the time delta of the CPU submission to GPU execution gets too large, the command buffer can be aborted. Especially if part of its work has entered the GPU job queue. Stalling the pipe is certainly not ideal, however it may take some re-architecting to workaround this issue, if compositing execution continues to cause these timeouts. Cycles live viewport rendering avoids this as the Cycles compute work is decoupled into a separate command queue and texture updates occur indirectly as results become available. A similar approach could work for the compositor in future, though if kernels are optimized and performance is reasonable, then this is also a non issue. I've only been able to reproduce this with the blur kernel, due to the nested loop and significant number of texture samples causing ~0.5 FPS on the M1 Pro GPU. The specific cause of the stall in this case also relates to specified Metal events which chain command buffers submitted by the Metal backend, to ensure execution of command buffers happens in order. (See encodeWaitForEvent and encodeSignalEvent in mtl_command_buffer.mm and mtl_context.mm) Without these, while the GPU will not stall, the viewport will flicker due to possible out-of-order resource updates. I.e. a pass may press on with the GPU work, but use a non-updated version of a texture. Dependency tracking between passes will only automatically synchronise a dependency if the update is already in flight. Apologies for the long ramble, brain dump from me as well, but just wanting to shed some light on the execution model within the backend. One other possible solution is to selectively disable the synchronisation between subsequent command buffers, such that the compute work executes in isolation, and the viewport will then display the old results if new ones are not yet ready. Though this will likely introduce flickering, and could also cause problems with the workload submission pipe if compositor work is decoupled from the render. In this case, it's also worth mentioning that if the GPU_finish call was isolated to known expensive kernels, then stalling the CPU does not necessarily degrade performance all that much, as the system is already heavily GPU bound in these scenarios anyway. MichaelPW: Certainly not a silly question, flushing would be ideal and is what I had originally tried and… | |||||
| } | |||||
| #endif | |||||
| } | } | ||||
| static void compositor_engine_update(void *data) | static void compositor_engine_update(void *data) | ||||
| { | { | ||||
| COMPOSITOR_Data *compositor_data = static_cast<COMPOSITOR_Data *>(data); | COMPOSITOR_Data *compositor_data = static_cast<COMPOSITOR_Data *>(data); | ||||
| /* Clear any info message that was set in a previous update. */ | /* Clear any info message that was set in a previous update. */ | ||||
| compositor_data->info[0] = '\0'; | compositor_data->info[0] = '\0'; | ||||
| Show All 28 Lines | |||||
Maybe a silly question but, can't we flush instead to also split the command buffer. Or does the time-out error affects all command buffers in queue?