Page MenuHome

Compositor: Full-frame base system
ClosedPublic

Authored by Manuel Castilla (manzanilla) on Apr 28 2021, 5:13 PM.
Tokens
"Love" token, awarded by zazizizou."Love" token, awarded by EAW."Love" token, awarded by franMarz."Love" token, awarded by Schamph."Like" token, awarded by GeorgiaPacific.

Details

Summary

This patch adds the base code needed to make the full-frame system work for both current tiled/per-pixel implementation of operations and full-frame.

Two execution models:

  • Tiled: Current implementation. Renders execution groups in tiles from outputs to input. Not all operations are buffered. Runs the tiled/per-pixel implementation.
  • FullFrame: All operations are buffered. Fully renders operations from inputs to outputs. Runs full-frame implementation of operations if available otherwise the current tiled/per-pixel. Creates output buffers on first read and free them as soon as all its readers have finished, reducing peak memory usage of complex/long trees. Operations are multi-threaded but do not run in parallel as Tiled (will be done in another patch).

This should allow us to convert operations to full-frame in small steps with the system already working and solve the problem of high memory usage.

FullFrame breaking changes respect Tiled system, mainly:

  • Translate, Rotate, Scale, and Transform take effect immediately instead of next buffered operation.
  • Any sampling is always done over inputs instead of last buffered operation.

Benchmarks:
Test 1: A lot of branches and non-buffered operations on Tiled.

Test 2: Few branches and many buffered operations on Tiled.

Intel i7-6700K, Windows 10

Execution ModelThreading ModelTest 1Test 2
TiledQueue0.65s / 211.42MB1.36s / 717.04MB
TiledTask0.66s / 211.42MB1.50s / 717.04MB
FullFrameQueue0.53s / 329.50MB1.26s / 241.76MB
FullFrameTask0.54s / 329.50MB1.25s / 241.76MB

Note: Measured time is for ExecutionSystem::execute. MBs for peak memory.

Diff Detail

Repository
rB Blender
Branch
cmp-full-frame (branched from master)
Build Status
Buildable 14282
Build 14282: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
source/blender/makesrna/intern/rna_userdef.c
626 ↗(On Diff #37069)

user settings should not change scene settings.

  • Rename var to has_outputs
  • Rename execution_model to execution_mode
  • User prefs option only enable panel option now
  • Add a TODO to check operation is constant
  • Create ExecutionModel classes to reduce complexity
  • Use pthread_signals and atomic counter
Manuel Castilla (manzanilla) marked 5 inline comments as done.May 14 2021, 5:02 PM
  • Fix comment formatting
  • Merge branch 'master' into cmp-full-frame
source/blender/compositor/intern/COM_ExecutionSystem.h
139

I thought of creating a separate class which only responsability is creation/recycling/freeing of all kind of buffers. It sets a custom deleter on buffer creation. OperationBufferPool would be a good name.

OutputStore responsability is to store operations output data/buffers so that it may be used by system or any operation. And dispose output buffers once readers don't need it (whether they are freed or recycled is OperationBufferPool responsability). It could be used to store metadata that need to be shared between root and final operations. Still I may better change name to OperationOutputStore?

Seems it is still in development as not all comments have been addressed?

Yes wasn't finished. Now there is nothing left for current review, except OutputStore name.

  • Use getInputOperation instead of socket links
  • Use WorkScheduler::finish() for better performance
  • Fix incorrect output border calculation
Jeroen Bakker (jbakker) requested changes to this revision.May 17 2021, 9:25 AM

Most feedback is on style and naming. I see this patch moving into a direction where it is almost there. The biggest issue is still the naming of OutputStore.

If we are out of ideas we should pick the best and add a TODO to find a better name.

source/blender/compositor/COM_defines.h
24

For multi line comments we should not add anything just behind the comment open tag. (/**<newline>). For single line comments it is fine.

source/blender/compositor/intern/COM_BufferOperation.cc
30

Not for this patch. but we could find a way to shorten this to something like
set_resolution(buffer.get_size())

the question is when should we use resolution and when size. I think that resolution might be an outdated term. size is better, but there are multiple meanings of the term size (allocated size vs dimension.).

source/blender/compositor/intern/COM_ExecutionSystem.h
139

OutputStore m_output_store doesn't tell what is stored. In a way it holds dependancies between operations.
Looking from the execution system point of view it isn't output, but operation buffers.
CPP uses the term shared for ref counting.

What do you think of SharedOperationBuffers active_buffers_?
We might want to tweak the names of the methods a bit to better match the name of the class.

As in style we used to follow MSVC style (m_) but we should move towards new style (private attributes get an underscore at the end.
(we should still go over the sourcecode to change it. but for new code we should try to follow this.

source/blender/compositor/intern/COM_FullFrameExecutionModel.cc
112 ↗(On Diff #37154)

In stead of passing an index, just pass input_op.

176 ↗(On Diff #37154)

Not sure if a mutex is really needed. Current implementation can lead to a dead lock if the work hasn't finished after calling WorkScheduler::finish();

183 ↗(On Diff #37154)

n_ should not be used, use _len or num_. In this case I would go for num_sub_works.

212 ↗(On Diff #37154)

Would do the cast the other way around. the unsigned int is the strange type in this code. so convert the strange type.
or use unsigned int for the local fields where needed.

266 ↗(On Diff #37154)

Could be moved to base class.

source/blender/compositor/intern/COM_MultiThreadedOperation.h
50

multi isn't clear. Would use update_memory_buffer_partial

source/blender/compositor/intern/COM_NodeOperation.cc
244

This method does multiple things.

it ensures that the input buffers have been rendered.
and it returns a list of all input buffers.

Split this in two methods. A method is only supposed to do one thing.

ensuring input buffers isn't a responsibility of a node operation. would move this to full frame execution model.

same for create_output_buffer. could become a responsibility of full frame execution model as well.

source/blender/compositor/intern/COM_NodeOperation.h
351

for new code use snake_case. You can update the other functions in a separate change.

564

get_area_of_interest should be enough. use parameter overloading (input_id for example)

source/blender/compositor/intern/COM_OutputStore.cc
79 ↗(On Diff #37154)

First check then act. move before assignment.

source/blender/compositor/intern/COM_WorkPackage.h
57

execute_fn

62

executed_fn;

We used _callback or _cb, but are migrating to _fn.

source/blender/makesdna/DNA_node_types.h
553 ↗(On Diff #37154)

Keep same name as in typedef enum ('eNodeTreeExecutionMode')

This revision now requires changes to proceed.May 17 2021, 9:25 AM
  • Fix multi line comment open tag
  • Rename new class data members from m_* to *_
  • Rename OutputStore to SharedOperationBuffers
  • Use num_* naming instead of n_*
  • Cast strange type values instead of normal ones.
  • Move is_breaked to base class
  • Rename get_input_area_of_interest to get_area_of_interest
  • Rename multi_update_memory_buffer to update_memory_buffer_partial
  • Rename public getInputOperation to get_input_operation
  • Move non NodeOperation responsabilities to FullFrameExecutionModel
  • Move assert before assignment
  • Rename WorkPackage functions to *_fn
  • Fix eNodeTreeExecutionMode typedef
Manuel Castilla (manzanilla) marked 12 inline comments as done.May 18 2021, 12:14 AM
Manuel Castilla (manzanilla) added inline comments.
source/blender/compositor/intern/COM_BufferOperation.cc
30

As "BLI_rcti_size_x" and "BLI_rcti_size_y" is frequently used, get_size_xy() may be ok?

source/blender/compositor/intern/COM_FullFrameExecutionModel.cc
112 ↗(On Diff #37154)

get_input_area_of_interest is meant to be overriden by subclasses. When implement it, it's easier to think in terms of input socket indexes.
Example code: P2121

source/blender/compositor/intern/COM_NodeOperation.h
564

I renamed it but I don't understand why it's needed input_id overloading. It will only be called in determine_rects_to_render.
And it will be overriden by many operations. Operations don't need to call their inputs get_area_of_interest. Example: P2121
Maybe there are issues with the way I mean to implement it and I'm not seeing it?

Think the code has an correct structure. Do have some concerns (mutexes, render areas), that should be handled/documented but don't see them as stopper at this moment.
the term render is a ambiguous but that could be changed along the way with cleanups.

Challenge for us is to find better naming. It is a good idea to start with a glossary (wiki.blender.org) containing technical terms and build up the naming from there.

source/blender/compositor/intern/COM_BufferOperation.cc
30

would go for Size2d MemoryBuffer::get_size() The suffix leads to more confusion when reading code as it tells something about the return type, readers can question them self: "Hey wait there is a suffix, perhaps it means something different.". In CPP it is better to used a struct as a return type for these cases.

source/blender/compositor/intern/COM_FullFrameExecutionModel.cc
264 ↗(On Diff #37186)

A bit hard to test but atomics can be heavier then a mutex. Reason for this is that atomics are (only) released when the CPU data cache are flushed. Depending on the silicon, and the number of threads this can lead to additional waits.

279 ↗(On Diff #37186)

Would be good to add a code snippet or even better a test case that shows this. As this might be an issue that needs to be tackled. This code adds a workaround for something that might be broken.

source/blender/compositor/intern/COM_NodeOperation.h
564

Could also not fully clear by me. In C you don't have parameter overloading and all names should be unique. in cpp this isn't needed and names can be more condens what is easier to read. I would suggest to use this until it is clear when this isn't possible. If so we need to find a solution.

source/blender/compositor/intern/COM_SharedOperationBuffers.cc
38 ↗(On Diff #37186)

we could see if we want to refactor this to request_area. Current implementation is incomplete: partial overlapping etc this could lead to more rendering than necessary. We could add a TODO for later concern.

Especially when rendering large resolutions this could become a time saver.

  • Add TODO to implement MemoryBuffer get_size()
  • Use mutex lock instead of atomics
  • Add TODO to remove execute_work workaround
  • Add TODO for get_area_of_interest parameter overloading
  • Add TODO to improve is_render_registered
Manuel Castilla (manzanilla) marked 5 inline comments as done.May 19 2021, 12:21 AM
Manuel Castilla (manzanilla) added inline comments.
source/blender/compositor/intern/COM_FullFrameExecutionModel.cc
264 ↗(On Diff #37186)

I changed it to use only lock. I prefer it because it's more clear that notify shouldn't happen before wait condition.

279 ↗(On Diff #37186)

I created some tests D11295. It's a quick implementation, I'll properly do it after this patch.

Manuel Castilla (manzanilla) marked 2 inline comments as done.EditedMay 19 2021, 8:21 AM

I've tested the use of parallel_for_each from BLI_task.hh as alternative and performance went from ~0.55s to ~1.90s, to use a thread/task pool as WorkScheduler does seems a must.
About WorkScheduler::finish issue, it's queue model only waits for works queue getting empty and not last work threads to finish. Tiled implementation may have been relying on this behaviour for a long time, I'm not sure if it was intended on initial implementation.
I have in mind all you've recommended, as this patch is already quite big, I'd prefer we do it in smaller patches after this one.

  • Move execute_work mutex and condition to class scope
  • Improve get_area_of_interest doc comments and signature
  • Improve SharedOperationBuffers methods naming and add doc comments
  • Improve comments and naming
Manuel Castilla (manzanilla) marked an inline comment as done.May 20 2021, 6:25 PM

I've create a glossary at https://wiki.blender.org/wiki/Source/Compositor
Will be adding more, specially terms that may not be clear.
Term "render" in SharedOperationsBuffer was certainly confusing. I've changed it to "area" and use the term in more places as it's related to get_input_area_of_interest.

  • Merge branch 'master' into cmp-full-frame
  • Add get_area_of_interest overloading and pass input_op
Manuel Castilla (manzanilla) marked an inline comment as done.May 22 2021, 4:14 PM
Jeroen Bakker (jbakker) requested changes to this revision.May 25 2021, 11:24 AM

During test I get a assert on

x487c265]
b(_ZN7blender10compositor13NodeOperation6renderEPNS0_12MemoryBufferENS_4SpanI4rctiEENS4_IS3_EERNS0_15ExecutionSystemE+0xa2) [0x487bff8]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x13e) [0x4875a62]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel22ensure_inputs_renderedEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x80) [0x4875768]
b(_ZN7blender10compositor23FullFrameExecutionModel16render_operationEPNS0_13NodeOperationERNS0_15ExecutionSystemE+0x7b) [0x487599f]
b(_ZN7blender10compositor23FullFrameExecutionModel17render_operationsERNS0_15ExecutionSystemE+0x11f) [0x4875c39]
b(_ZN7blender10compositor23FullFrameExecutionModel7executeERNS0_15ExecutionSystemE+0x86) [0x4875546]
b(_ZN7blender10compositor15ExecutionSystem7executeEv+0x48) [0x48727b6]
b(COM_execute+0x27b) [0x4871f97]
b(ntreeCompositExecTree+0x4c) [0x42eed75]
b() [0x4f261ef]
b() [0x3d0aee7]
b() [0x1072c4eb]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9450) [0x7f6187bf0450]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f6187771d53]
BLI_assert failed: source/blender/compositor/intern/COM_MemoryBuffer.h:166, get_elem(), at 'x >= m_rect.xmin && x < m_rect.xmax && y >= m_rect.ymin && y < m_rect.ymax'

Reason is that it generates a buffer with 0 width and height. Would assume that it needs to be at least 1 in height/width. This is a single value buffer.

This revision now requires changes to proceed.May 25 2021, 11:24 AM
  • Merge branch 'master' into cmp-full-frame
  • Fix areas of interest outside operation bounds

The crash has been fixed with D11381, merged from master. Operations should never have 0 resolution.

This revision is now accepted and ready to land.May 31 2021, 3:08 PM
  • Fix macos compiler warning: side effects on typeid
This revision was automatically updated to reflect the committed changes.