Page MenuHome

Multithreading sync portion of Cycles Render
Needs ReviewPublic

Authored by Jagannadhan Ravi (easythrees) on Jul 7 2020, 6:25 AM.

Details

Summary

This is an ongoing process to try and multi-thread the sync portion of the Cycles render process. Putting this up for "review" because I'd like some guidance on how to properly proceed with this work.

July 12th 2020: Based on Sergey's advice, I've scrubbed through and replaced direct references to b_ob where possible. I'm still seeing crashes in grabbing mesh data.

Diff Detail

Repository
rB Blender

Event Timeline

Jagannadhan Ravi (easythrees) requested review of this revision.Jul 7 2020, 6:25 AM
Jagannadhan Ravi (easythrees) created this revision.
intern/cycles/blender/blender_geometry.cpp
29

Just created a new version of this function that takes in the sync object. When running the multi-threaded version, I saw a lot of crashes accessing data members of BL::Object, so I figured I'd store some info on the sync object.

intern/cycles/blender/blender_object.cpp
98

This is the function that should be called in a multi-threaded fashion. I add this to the task pool in sync_objects(...)

658

Task pool here to launch the new sync task function.

intern/cycles/blender/blender_geometry.cpp
29

Avoid access to b_ob and b_ob_instance here. Pass b_ob.data() instead, and all the rest of Object level properties stored in a some sort of struct which is initialized form single-threaded traversal.

intern/cycles/blender/blender_geometry.cpp
29

So do you mean pass in b_ob.data() and b_ob_instance.data() instead?

intern/cycles/blender/blender_geometry.cpp
29

What should I do about this line:

BL::ID b_key_id = (BKE_object_is_modified(b_ob)) ? b_ob_instance : b_ob_data;

I can't avoid accessing b_ob_instance here right? Or should I just use b_ob_instance.data()?

intern/cycles/blender/blender_geometry.cpp
29

Also, what should I do about sync_mesh(...), sync_hair(...) and sync_volume(...)? I'm guessing I need to do the same thing?

intern/cycles/blender/blender_geometry.cpp
51

This is another gotcha I think. material_slots is defined as

COLLECTION_PROPERTY(DefaultCollectionFunctions, MaterialSlot, Object, material_slots, true, true, true)

Is there a "proper" way to grab these slots?

Jagannadhan Ravi (easythrees) marked an inline comment as done.Jul 13 2020, 1:57 AM
Jagannadhan Ravi (easythrees) added inline comments.
intern/cycles/blender/blender_util.h
123

I get a crash here now, because mesh is NULL, meaning object_data is also NULL. Ultimately, this goes back to

curr_object.b_ob_data = curr_object.b_object.data();

which is what I set in the for loop in sync_objects(...). Any ideas as to what I'm doing wrong here? Weirdly, if I check for it being NULL, and run the render in the debugger, it does render but it's an incomplete image. Still crashes on standalone, but not sure why.

This seems to be much more complicated that it should be.

Have a look at P1529.

Basically:

  • Use b_temp_object to access transform and persistent ID and other instance-specific fields.
  • Do not use b_temp_object outside of loop body, cache all the fields of interest in a context of some sort (see the SyncObject from the snipped of code I've sent in blender-chat. Not the best name, but just to show the idea),
  • Pass the SyncObject instead of b_instance to the sync_object().
  • Pass b_accessible_object instead of b_object to the sync_object(),
  • Replace direct call of sync_object() with some threaded pool push.
  • Done.

This seems to be much more complicated that it should be.

Have a look at P1529.

Basically:

  • Use b_temp_object to access transform and persistent ID and other instance-specific fields.
  • Do not use b_temp_object outside of loop body, cache all the fields of interest in a context of some sort (see the SyncObject from the snipped of code I've sent in blender-chat. Not the best name, but just to show the idea),
  • Pass the SyncObject instead of b_instance to the sync_object().
  • Pass b_accessible_object instead of b_object to the sync_object(),
  • Replace direct call of sync_object() with some threaded pool push.
  • Done.

I'm a little confused, I thought I was doing this. In your note on blender_geometry.cpp you mentioned:

Avoid access to b_ob and b_ob_instance here. Pass b_ob.data() instead, and all the rest of Object level properties stored in a some sort of struct which is initialized form single-threaded traversal.

which is how I ended up with all those properties.

Okay, I'll do up this change again, a third time :).

EDIT: Hi Sergey, it looks like I'm going down the same road again. All of the functions that sync_geometry(...) calls ask for a BL::Object and they assume it's something like b_ob because they all do things like looping through material slots, which are on b_ob and not on the data member as far as I can tell.

Please don't scatter discussion everywhere where you see comment box.

What does it mean if b_accessible_object.data() return NULL?

If the object is type Empty it's data is null, In all other cases it should never be null. I was testing this P1531 with the classroom scene (which does have instances), and had no b_accessible_object to have data() of null.

Accessing the data() function in my sync function keeps crashing, which makes me think the data underneath's been invalidated. This makes sense given your comment about how b_accessible_object.data() and b_temp_object.data() both point to the same thing, but then how can I make sure it's not being invalidated?

Please have a look into depsgraph_query_iter.cc where temp_dupli_object is constructed, and into DNA_object_types.h where the object's type is defined.

Roughly, b_iterator owns the memory where b_temp_objectlives. b_accessible_object is assigned to b_temp_object and object matrix of b_temp_object is assigned to the world matrix of the instance. The data is a void*. It is owned by the dependency graph. You can assign pointer anywhere, and it will still be valid.

which is how I ended up with all those properties.

I don't see why you went into all the complexity of dealing with modifiers. Material slots, modifiers and so on are all available via b_accessible_object.


I would suggest you starting from scratch in a simpler way:

  • Replace b_instance argument passed to sync_object with Cycles's side structure, which will hold all information of interest needed for Cycles.
  • Split the current for-loop over instances coming from depsgraph into two loops: first loop will initialize the b_instance replacement form above and store it in a vector together with b_accessible_object. Second loop will iterate over this vector and call sync_object(). NO threading at this point. Just make this deferred case to work reliably.
  • Make the loop body run in parallel, using parallel_for. Solve all the possible issues which are discovered.
  • Look into possibly replacing vector+parallel_for with a task-based approach. Not sure it worth it though: you still need to store some per-task context anyway, so to me it seems using Task will just cause more of small allocations, without really reducing the overall memory footprint.

When doing tests, use small scenes first. Default cube, Default cube with sphere. Cornell box.
Only when those work reliably go to a more complex scenes.

Please don't scatter discussion everywhere where you see comment box.

What does it mean if b_accessible_object.data() return NULL?

If the object is type Empty it's data is null, In all other cases it should never be null. I was testing this P1531 with the classroom scene (which does have instances), and had no b_accessible_object to have data() of null.

Accessing the data() function in my sync function keeps crashing, which makes me think the data underneath's been invalidated. This makes sense given your comment about how b_accessible_object.data() and b_temp_object.data() both point to the same thing, but then how can I make sure it's not being invalidated?

Please have a look into depsgraph_query_iter.cc where temp_dupli_object is constructed, and into DNA_object_types.h where the object's type is defined.

Roughly, b_iterator owns the memory where b_temp_objectlives. b_accessible_object is assigned to b_temp_object and object matrix of b_temp_object is assigned to the world matrix of the instance. The data is a void*. It is owned by the dependency graph. You can assign pointer anywhere, and it will still be valid.

which is how I ended up with all those properties.

I don't see why you went into all the complexity of dealing with modifiers. Material slots, modifiers and so on are all available via b_accessible_object.


I would suggest you starting from scratch in a simpler way:

  • Replace b_instance argument passed to sync_object with Cycles's side structure, which will hold all information of interest needed for Cycles.
  • Split the current for-loop over instances coming from depsgraph into two loops: first loop will initialize the b_instance replacement form above and store it in a vector together with b_accessible_object. Second loop will iterate over this vector and call sync_object(). NO threading at this point. Just make this deferred case to work reliably.
  • Make the loop body run in parallel, using parallel_for. Solve all the possible issues which are discovered.
  • Look into possibly replacing vector+parallel_for with a task-based approach. Not sure it worth it though: you still need to store some per-task context anyway, so to me it seems using Task will just cause more of small allocations, without really reducing the overall memory footprint.

When doing tests, use small scenes first. Default cube, Default cube with sphere. Cornell box.
Only when those work reliably go to a more complex scenes.

Please accept my apologies, I am on a deadline and it looks like I have not estimated the work properly. I will re-do the code change as you suggest and will start a new review/session when I have something I need feedback on.

Hi Sergey, so just to make sure I'm starting things off correctly, I have some specific questions about handling objects in sync_object. Right now, I have this:

BL::Object b_ob = b_instance.object();
BL::Object b_parent = curr_obj.is_instance ? b_instance.parent() : b_instance.object();
BL::Object b_ob_instance = curr_obj.is_instance ? b_instance.instance_object() : b_ob;

For b_ob, can I just use the b_accessible_object that I'll be passing in? This is from b_instance and this object does get used quite a bit, which is why I think I went down the rabbit hole the last time.

I'm guessing b_parent is not a temp object, correct? I'll add it to the SyncObject struct.

Also, later in the sync operation, we have:

if (curr_obj.is_instance) {
    /* Sync possible particle data. */
    sync_dupli_particle(b_parent, b_instance, object);
  }

Which ends up using b_instance.object(). In this case, it looks like it's specifically asking for the temp object, right? Can we get away with using the accessible object here?

For b_ob, can I just use the b_accessible_object that I'll be passing in?

That is correct.
You'd need to replace b_ob.matrix_world() with the matrix stored in the SyncObject though.

I'm guessing b_parent is not a temp object, correct? I'll add it to the SyncObject struct.

From my understanding b_parent is not temp and can be stored as well.

Naming is a bit weak. It is not the parent as in Blender's hierarchy terms, it is an object which instanced the b_ob. So to me the name should be something more like b_instancer_object.

That is correct.
You'd need to replace b_ob.matrix_world() with the matrix stored in the SyncObject though.

Okay, I'll do that.

I'm guessing b_parent is not a temp object, correct? I'll add it to the SyncObject struct.

From my understanding b_parent is not temp and can be stored as well.

I think this is where everything went nuts the last time. I have that dual loop idea of yours working in single threaded fashion. However, if I change the loop to a parallel_for (I'll use a simpler example, but for now I just used the barbershop), I get a crash here:

bool use_holdout = get_boolean(cobject, "is_holdout") ||
                   curr_obj.b_parent.holdout_get(PointerRNA_NULL, b_view_layer);

The cobject part seems fine, since it's ultimately coming from b_accessible_object which I'm storing on SyncObject. However, b_parent.ptr.data() looks to be invalid somehow, and I'm not sure why. Wrapping a mutex around this doesn't help either, so my guess is that I'm either not storing this object correctly. This is how I store it now:

curr_obj.b_parent = curr_obj.is_instance ? b_instance.parent() : b_instance.object();

In the case of the barbershop, almost nothing is an instance, so b_instance.object() (which is from the iterator) should be valid, but it apparently isn't. Maybe instead of b_instance.object() I should just use b_accessible_object? Any thoughts? The stack of the crash looks like:

0 	[Inline Frame] blender.exe!ghash_lookup_entry_ex(GHash *) Line 390	C
1 	[Inline Frame] blender.exe!ghash_lookup_entry(GHash *) Line 429	C
2 	blender.exe!BLI_ghash_lookup(GHash * gh, const void * key) Line 811	C
3 	blender.exe!rna_Object_holdout_get(Object * ob, bContext * C, ViewLayer * view_layer) Line 206	C
4 	[Inline Frame] blender.exe!BL::Object::holdout_get(BL::Context) Line 53558	C++
5 	blender.exe!ccl::BlenderSync::sync_task(BL::Depsgraph & b_depsgraph, BL::ViewLayer & b_view_layer, ccl::SyncObject & curr_obj, float motion_time, bool use_particle_hair, bool show_lights, ccl::BlenderObjectCulling & culling, bool * use_portal) Line 153	C++
6 	[Inline Frame] blender.exe!ccl::BlenderSync::sync_objects::__l2::<lambda_59db69039c91ac5c3c7235c8698af878>::operator()(tbb::blocked_range<int>) Line 710	C++

And Visual Studio being Visual Studio isn't showing me what this resolves to when looking at frame 4.

Quick update, I can do a render of a simple scene with some objects and a spotlight. Had to use a mutex in two spots:

sync_mutex.lock();
if (object_map.add_or_update(&object, b_ob, curr_obj.b_parent, key))
  object_updated = true;
sync_mutex.unlock();

and

/* mesh sync */
sync_mutex.lock();
object->geometry = sync_geometry(b_depsgraph, b_ob, b_ob_instance, object_updated, use_particle_hair);
sync_mutex.unlock();

I actually wonder if it's possible to use TBB::concurrent_unordered_map for the maps here (we'd have to add extra methods to be compliant with id_map).

I still get this crash with the barbershop :(

Yet another update, the wheels of the BMW are missing, so I'm not handling the instance transform correctly I think.

Hi Sergey, I've created a new review as we discussed. It's located here: https://developer.blender.org/D8324 I'm not sure how to 'delete' an existing review though....

There is no "delete", you cam "abandon" or "close" it.