Page MenuHome

Fix T85752: Collection Instance Crash when instancing collections with disabled subcollections
ClosedPublic

Authored by Bastien Montagne (mont29) on Apr 7 2021, 11:03 AM.

Details

Summary

Root of the issue was actually hidden deep in depsgraph itself: it would
not properly update all of its COW IDs using a datablock when depsgraph
decides to evaluate or un-evaluate it.

This would lead to evaluated IDs pointing to either:

  • orig IDs when there was an evaluated version of those (annoying bug, but not a crashing one).
  • old address of previously evaluated IDs that no longer exists in the depsgraph (causing the crash from the report e.g.).

This commit adds an extra step at the end of nodes building, that goes
over all of already existing IDs in the depsgraph to check whether they
do one of the two things above, and tag them for COW update if so.

NOTE: This only affects depsgraph (re-)building, not its evaluation. This remains consistent with the fact that operations that may change the depsgraph content (like Collection exclusion etc.) need to trigger a rebuild.
NOTE: Performances: Worst case scenarii, like (un-)excluding a whole character collection in a production file, lead to 5% to 10% extra processing time in depsgraph building. Most of it comming from extra COW processing (in depsgraph's update in build_step_finalize), the detection loop itself only accounts for 1% to 2% of the whole building time.

Diff Detail

Repository
rB Blender
Branch
arcpatch-D10907 (branched from master)
Build Status
Buildable 14468
Build 14468: arc lint + arc unit

Event Timeline

Bastien Montagne (mont29) requested review of this revision.Apr 7 2021, 11:03 AM
Bastien Montagne (mont29) created this revision.
Bastien Montagne (mont29) added inline comments.
source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
377–381

This call (and the same one below) is the thing am the least confident about... Not clear to me whether we can call that from here, or if there is a better way to do the tagging?

404

Also not sure if this combination of static local wrapper + class function is the best way to do it? Avoids the need to define a specific struct as custom data parameter for BKE_library_foreach_ID_link call below...

Sybren A. Stüvel (sybren) requested changes to this revision.Apr 8 2021, 1:00 PM
Sybren A. Stüvel (sybren) added inline comments.
source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
365

"Utils" is plural, but this comment documents a single function.

370

To me it's not so clear what these parameters are, and how they relate to each other.

372–401

The structure of this code is a bit convoluted, and can be simplified a bit:

{
  if (id->orig_id == NULL) {
    /* `id_cow_self` uses a non-cow ID, if that ID has a COW copy in current depsgraph its owner
     * needs to be remapped, i.e. COW-flushed. */
    const IDNode *id_node = find_id_node(id);
    if (id_node == nullptr || id_node->id_cow == nullptr) {
      return IDWALK_RET_NOP;
    }
  }
  else {
    /* `id_cow_self` uses a COW ID, if that COW copy is removed from current depsgraph its owner
     * needs to be remapped, i.e. COW-flushed. */
    /* NOTE: at that stage, old existing COW copies that are to be removed from current state of
     * evaluated depsgraph are still valid pointers, they are freed later (typically during
     * destruction of the builder itslef). */
    const IDNode *id_node = find_id_node(id->orig_id);
    if (id_node != nullptr) {
      return IDWALK_RET_NOP;
    }
  }

  graph_id_tag_update(bmain_,
                      graph_,
                      id_cow_self->orig_id,
                      ID_RECALC_COPY_ON_WRITE,
                      DEG_UPDATE_SOURCE_RELATIONS);
  return IDWALK_RET_STOP_ITER;
}

It's then also clear that both code paths result in the same call to graph_id_tag_update(), and that their difference is purely dependent on id_node (how it's found, and how it's inspected).

Of course this is not taking into account changes discussed above (re: calling graph_id_tag_update_single_flag())

377–381

graph_id_tag_update does graph->find_id_node(id_cow_self->orig_id). I suspect id_cow_self comes from the id_node->id_cow below, so I think that find_id_node() call will just return the id_node from that function. If we can pass that to this function somehow, it can avoid a potentially large number of graph lookups. In that case graph_id_tag_update_single_flag(..., id_node, ...) can be used directly.

482

typo

This revision now requires changes to proceed.Apr 8 2021, 1:00 PM
Bastien Montagne (mont29) marked an inline comment as done.Apr 8 2021, 3:42 PM
Bastien Montagne (mont29) added inline comments.
source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
365

utils because this also covers the static wrapper function below...

370

This is an (indirect) callback for BKE_library_foreach_ID_link, and as such those names are inspired from the naming in LibraryIDLinkCallbackData struct...

I don't mind adding comments about that if you really think this is needed (or if you have ideas for better names), but I think there is a limit where comments should not substitute to reading the code and general knowledge of Blender code too...

To be more clear, I don't there should be comments here explaining how callbacks for BKE_library_foreach_ID_link are working, and what kind of data they are getting?

TBH I'm much more annoyed/unsure about that combination of two functions (static wrapper callback + public method), as noted in a comment below. But I do not know a better way to do that right now (maybe std::bind could be used here actually...)

372–401

I totally disagree, your suggested code is much less clear to me about the underlying logic of this function. I would read it as 'by default, tag self for update and stop iteration, unless <some specific condition> is met, then keep looping'.

This is the exact opposite of the actual logic, which is `by default, keep looping without doing anything, unless <some specific condition> is met, then tag and return'.

If the double call to graph_id_tag_update annoys you, we could instead use a bool id_self_needs_tagging variable, and check this one at the end to decide if we can tag and break iteration, or not... But in that case I do not really see it as useful (as in, making things more readable and clear) either tbh.

377–381

I really doubt calling graph_id_tag_update_single_flag would be enough? graph_id_tag_update does much more than just finding the id_node and calling graph_id_tag_update_single_flag on it...

Note that the extra call to find_id_node in graph_id_tag_update is extremely unlikely to be done in large number, it would only happen once per ID, and only on those IDs that do need the tagging specifically because of internal depsgraph building decisions to remove/bring back some nodes, which only happens in a few specific cases and should not typically affect that many IDs.

As pointed out in commit message, this whole new check itself is extremely fast, almost all the extra processing time in deg building induced by this patch comes from the extra COW flushing it triggers.

source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
370

Actually, last part about std::bind is stupid, C code will never know what to do with such exotic C++ object...

Minor updates from review.

Sybren A. Stüvel (sybren) requested changes to this revision.Apr 8 2021, 4:22 PM
Sybren A. Stüvel (sybren) added inline comments.
source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
365

In that case I'd add a newline between this comment and the function, and decorate it so that it reads more like a section header and less like the documentation of a single function.

370

This is an (indirect) callback for BKE_library_foreach_ID_link, and as such those names are inspired from the naming in LibraryIDLinkCallbackData struct...

When reading this function, this is not a known fact. It's only known once you start digging into other functions to see how it's used. This is not a good way to design clear, understandable code.

I don't mind adding comments about that if you really think this is needed (or if you have ideas for better names), but I think there is a limit where comments should not substitute to reading the code and general knowledge of Blender code too...

Local understandability of code should not be underestimated. I think that by renaming ID *id to id_pointer you already clarified things a bit more.

To be more clear, I don't there should be comments here explaining how callbacks for BKE_library_foreach_ID_link are working, and what kind of data they are getting?

You don't what? You don't think? Or you don't doubt? In any case it's irrelevant because the reader likely doesn't even know that this function is used as such a callback, because that's not documented either.

When there is a comment that explains what a function does, it should also be clear how the parameters of that function map to whatever was explained about it.

TBH I'm much more annoyed/unsure about that combination of two functions (static wrapper callback + public method), as noted in a comment below.

This is an ok approach, it happens in other places of Blender as well where a C++ method + its this pointer have to be used as a callback function.

372–401

As a personal design rule I try to put the core functionality of a function outside of any conditionals, and put precondition checks before that. The core functionality of this function IMO is not to do nothing, but is to tag IDs that adhere to certain conditions.

The fact that some behaviour is the default for BKE_library_foreach_ID_link doesn't mean it should be the core functionality of this function. That just creates unnecessary coupling IMO.

Either way, the code isn't all that complex, so order it however you please.

391

typo "itslef"

This revision now requires changes to proceed.Apr 8 2021, 4:22 PM
Bastien Montagne (mont29) marked an inline comment as done.Apr 8 2021, 8:49 PM
Bastien Montagne (mont29) added inline comments.
source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
372–401

We agree on the general design rule I think, but to me those condition checks are integral parts of the 'core logic' of that function...

Bastien Montagne (mont29) marked an inline comment as done.Apr 8 2021, 8:49 PM

There is one conceptual thing I don't quite understand. In the code you write:

/* Check for IDs that need to be flushed (COW-updated) because the depsgraph itself created or
 * removed some of their evaluated dependencies.

What's the reason to not just fix the depsgraph in those places where it created or removed some of those evaluated dependencies? What's against taking the appropriate action at that time, instead of always performing this post-build check?

What's the reason to not just fix the depsgraph in those places where it created or removed some of those evaluated dependencies? What's against taking the appropriate action at that time, instead of always performing this post-build check?

Not sure that would even be possible? Isn't graph building also threaded?
And please do remember that the IDs we need to tag are not the IDs 'played with' by the depsgraph (i.e. those which COW-copied/evaluated status may change due to depsgraph internal code), but their users. Would mean you'd need to loop over whole Main every time you hit one of those case, to actually find their users, instead of doing it only once as part of post-process step..
Finally, doing it as a final stage allows to have this at a single place, in a generic way, regardless of whether same behavior is added/removed to/from some ID type in the future.
Not to mention the extra overhead of adding specific case handling for each usecase, and risk to forget to add it to new similar ones in the future.

EDIT: in fact am fairly certain it is not possible to do that before post-process step, since you need all nodes to be properly created. even is building is not threaded, there is a fair chance that not all nodes needing this process/check exist already when a specific one is built and changes its status (i.e. when we build IDnode A, which is used by IDnode B, there is a fair chance that B has not yet been processed by build system, so we cannot even properly check it).

And please do remember that the IDs we need to tag are not the IDs 'played with' by the depsgraph (i.e. those which COW-copied/evaluated status may change due to depsgraph internal code), but their users. Would mean you'd need to loop over whole Main every time you hit one of those case, to actually find their users, instead of doing it only once as part of post-process step..

Yeah, that's a good point.

LGTM on the general approach, but I do think the void DepsgraphNodeBuilder::end_build() function now does too much and should be split up.

source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
365–461

This function can be split up into two, since it now does two different things (its original functionality + this new pass over the nodes).

source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
365–461

How would you call them then?

source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
365–461

Because the original name is extremely generic and vague, so... would totally fit both functionalities.

source/blender/depsgraph/intern/builder/deg_builder_nodes.cc
365–461

I agree, the original name is super vague in terms of what the code actually does. How about something like tag_previously_tagged_nodes() and update_invalid_cow_pointers()?

Split end_build() function.

Note that I kept end_build() itself as a wrapper, because it does not really
make sense to expose the internals in the public API of the class imho. Also,
it maches better with the begin_build initializer call.

Much better, thanks!

This revision is now accepted and ready to land.May 14 2021, 3:47 PM