Page MenuHome

WM: Speed up processing of duplicate notifiers
AbandonedPublic

Authored by Hans Goudey (HooglyBoogly) on Jun 5 2022, 6:14 PM.

Details

Summary

For addons doing many operations finding duplicate nodifiers can be
a bottleneck. This makes sense, because the runtime was quadratic.
For every new notifier, all previous notifiers were checked in the
worst case. This patch should eliminate that bottleneck by making
the runtime constant (per-notifier) instead, by storing notifiers
in a set as well as a queue.

The simplest way to build the required behavior was by adding
a small class to provide the necessary interface on top of
std::dequeue and blender::Set. That makes ensuring the
invariants more obvious, and the choice of data structures means
there are no allocations per notification like for ListBase.

Diff Detail

Repository
rB Blender

Event Timeline

Hans Goudey (HooglyBoogly) requested review of this revision.Jun 5 2022, 6:14 PM
Hans Goudey (HooglyBoogly) created this revision.
Aras Pranckevicius (aras_p) added inline comments.
source/blender/windowmanager/intern/wm_event_system.cc
265

Minor: wouldn't it be cleaner/clearer to use helpers like get_default_hash_4 and similar from BLI_hash.hh?

source/blender/windowmanager/intern/wm_event_system.cc
265

Yes, much clearer! And less arbitrary! Thanks!

Add and use get_default_hash_5

Hans Goudey (HooglyBoogly) marked an inline comment as done.Jun 5 2022, 9:17 PM
This revision is now accepted and ready to land.Jun 6 2022, 11:46 AM
This revision now requires review to proceed.Jun 6 2022, 2:25 PM

The background to this is I noticed in an add-on called BlenderBIM Add-on that can import .ifc files, adding notifiers can create a bottleneck while importing big files (10k+ objects).

Importing a test .ifc-file without this patch takes around 50 seconds where the blender processing part is around 20 seconds of that.
With the patch the Blender processing part is on average around 7-8s faster as the wm_test_duplicate_modifier calls disappear from the list.

Before:

After:

Hans Goudey (HooglyBoogly) retitled this revision from WM: Speed up processing of duplicate notifiers (WIP) to WM: Speed up processing of duplicate notifiers.Jun 7 2022, 7:48 AM
source/blender/windowmanager/intern/wm_event_system.cc
254

Is using equality here such a good idea?

It means notifiers will be equal even when they have different window members.

Although this could even be considered a bug/oversight in the existing code that the window wasn't compared.

333–342

This this seems unnecessarily complicated, surely there is some convention for filtering items that doesn't require duplicating the list.

source/blender/windowmanager/intern/wm_event_system.cc
265

Unnecessary hashing can be avoided using (value.category | value.data | value.subtype | value.action) for one of the arguments.

Hans Goudey (HooglyBoogly) marked an inline comment as done.
Hans Goudey (HooglyBoogly) edited the summary of this revision. (Show Details)
Hans Goudey (HooglyBoogly) added inline comments.
source/blender/windowmanager/intern/wm_event_system.cc
254

Defining an equality is required for using VectorSet (and most other hash data structures).

I tried to keep the same behavior as wm_test_duplicate_notifier, but if you think that's not purposeful, I'd be happy to simplify this.

265

Good idea, thanks.

333–342

The complexity comes from a few constraints of the VectorSet data structure (which also happen to be its strengths).

  • You cannot modify values directly inside of the set, since that changes their hash.
  • You cannot remove items and leave an empty space, because all keys are stored in a contiguous memory block.

In practice I don't think this is so bad, and it's simpler than the existing code anyway.

Using Set would be simpler in some ways, but I wanted the pop() method to avoid larger changes to the notifier processing.

source/blender/windowmanager/intern/wm_event_system.cc
333–342

Correction (courtesy of Jacques ;): It is possible to remove from a VectorSet, if we allow the order of elements to change, by replacing the removed item with the last.

@Campbell Barton (campbellbarton) Are we allowed to change the order of notifiers? If so, that should simplify this a bit. Conceptually it seems like that should be allowed, but maybe I'm missing something.

Checking over the changes in this patch I'm worried that any functional changes that cause bugs could be quite difficult and time consuming to track down.

And it seems not trivial for this patch to have absolutely no functional changes, for example WM_main_remove_notifier_reference interaction with wm_event_do_notifiers seems like it could change. As noted, the second loop that runs the listener commands now runs in reverse order.


It seems like VectorSet is fine for keeping a list of de-duplicated items but not so good for handling a queue, especially when items need to be removed from the start while other items make be added to the end.

Perhaps it would be better to keep the current ListBase and use a set just to do the duplication checks as this is the main bottleneck and likely gets the us the performance benefits of this patch.

source/blender/windowmanager/intern/wm_event_system.cc
577

This will handle notifiers in the reverse order.

source/blender/windowmanager/intern/wm_event_system.cc
333–342

I don't think running notifiers out of order is worth the risk of introducing bugs.

Even if changing the order of notifiers doesn't cause obvious breakage, there may be subtle bugs which are difficult to track down. These kinds of bugs are more likely to bite us when event handling gets behind and notifiers queue up, which will make it even more difficult to track down problems.

Hans Goudey (HooglyBoogly) edited the summary of this revision. (Show Details)
  • Use a proper queue instead of using VectorSet
  • "Removed" notifiers are now just cleared like before

Those are fair points about keeping the behavior the same.
Conceptually I bet things could be simplified, but in practice, it's possible that more code relied on the old behavior, and it's probably not worth the time to find out.

Hans Goudey (HooglyBoogly) marked 4 inline comments as done.Jun 28 2022, 12:46 AM

Is there a test-case for performance that can be used to validate this patch or compare with alternative solutions?

It seems like a reasonably big change to solve what could done fairly simply with a set that tracks duplicates.

Is there a test-case for performance that can be used to validate this patch or compare with alternative solutions?

It seems like a reasonably big change to solve what could done fairly simply with a set that tracks duplicates.

This runs in 6.2s vs 5.3s for me (before/after). That ~1s seems to be around 60/40% between rna_Collection_objects_link and rna_property_update adding notifiers.
(tested in RelWithDebInfo with and without debugging)

import bpy
import time

start_time = time.time()

for i in range(10000):
    mesh = bpy.data.meshes.new(str(i).zfill(6))
    obj = bpy.data.objects.new(str(i).zfill(6), mesh)
    obj.scale[0] = 1.2
    
    bpy.context.collection.objects.link(obj)

print("finished in", time.time() - start_time,"s")

This patch is a smaller change which just uses a set to track duplicates P3034.

Using this script to benchmark: P3035 (running with CPU governor set to performance, best of 3 times for both), it's a little faster in my tests ~4%.

This patch is a smaller change which just uses a set to track duplicates P3034.

Using this script to benchmark: P3035 (running with CPU governor set to performance, best of 3 times for both), it's a little faster in my tests ~4%.

I guess this below shows better the problem I've seen..

Summed the runtime should be less than 0.4s as that's what the time is if the assignments are run on their own, but this runs in 0.75s.
I think if other kinds of notifications are added in the same loop this will quickly add up and make the search slower for each notifier that doesn't match the lookup.

With the patch the time is 0.04s so that's ~20x faster.

Run the commented lines first time to create the objects.

#for i in range(10000):
#    mesh = bpy.data.meshes.new(str(i).zfill(6))
#    obj = bpy.data.objects.new(str(i).zfill(6), mesh)
#    bpy.context.collection.objects.link(obj)

objs = [bpy.data.objects[str(i%10000).zfill(6)] for i in range(10000)]
meshes = [bpy.data.meshes[str(i%10000).zfill(6)] for i in range(10000)]

start_time = time.time()

# The comment after each is if they are run on their own.
for i in range(10000):
    objs[i].scale[0] = 1.2 # 0.19s
    meshes[i].auto_smooth_angle = 1.0 # 0.19s

print("finished in", time.time() - start_time,"s")

Even though the change is larger, personally I prefer the version in this patch, since using a class for the queue helps to ensure some invariants that are harder to keep tack of when simply adding a GSet.
Just adding a GSet, we have to be more watchful for changes that make the two inconsistent, but when the class manages everything that becomes more obvious.
I also like the type safety of avoiding GSet and ListBase and the (theoretical, I admit) performance considerations of avoiding many small allocations and using Set over GSet.

Finding a proper benchmark is hard, since the amount of random notifiers in the queue will change a lot, which requires a complex real world situation.
I'm happy to do any more benchmarking if it's helpful though.

All that said, this is definitely @Campbell Barton (campbellbarton)'s call in the end! Since this reasoning for this patch is a performance improvement, it's reasonable to keep it at that and say that further improvements should be separate.

In general this patch seems like a bigger change than it needs to be to solve the problem that's reported.
I'm not necessarily against the changes it makes however, I don't see the need to make such large changes either.

One down side is that it's much more difficult to track who generated a notifier which is very handy (perhaps the most important piece of information) whenever I've had to debug problems with notifiers. Of course notifiers could always be extended to include debug information but as avoiding individual allocations in this case doesn't seem to have much benefit - this gives us an additional problems to solve.

I'd rather just use the existing code with a GSet to de-duplicate, and if this patch has some compelling reason to include, it can be reviewed separately (or something in-between - preferring C++ container types for e.g.).