Single threaded operations always creates a buffer around and write
buffer operation would sample that buffer one pixel at a time. This
refactor would push the samples in one go to the output buffer by
merging the write buffer and single threaded operation.
Less code complexity, less needed CPU cycles and less memory would be
needed.
AMD Ryzen 1700, Linux
| branch | scheduling mode | scheduling backend | time ExecutionSystem.execute |
| master | Output to input | pthread_queue | 3.15s |
| temp-compositor-scheduling | Output to input | pthread_queue | 3.09s |
| temp-compositor-single-threaded-operation | Output to input | pthread_queue | 3.03s |
Next steps (after patch lands)
- Rename single_threaded to write_full_buffer.
- Convert CalculateMeanOperation to use write_full_buffer. (Add as a first good task).
- Convert FastGaussian*Operation to use write_full_buffer.
- Convert Set*Operation to use write_full_buffer. Would speed up unattached input sockets of complex operations.
- Add similar mechanism for write_partial_buffer.