Page MenuHome

BLI: Add atomic disjoint set data structure.
ClosedPublic

Authored by Jacques Lucke (JacquesLucke) on Nov 30 2022, 1:10 PM.

Details

Summary

The existing DisjointSet data structure only supports single threaded access, which limits performance severely in some cases.

This patch implements AtomicDisjointSet based on "Wait-free Parallel Algorithms for the Union-Find Problem" by Richard J. Anderson and Heather Woll (paper). I was also inspired by an existing implementation: https://github.com/wjakob/dset/blob/master/dset.h.

I updated the Mesh Islands node to make use of the new data structure. The performance is improvement is quite good:

Island Index:
  few large islands: 138ms -> 38ms
  many small islands: 53ms -> 25ms 
Island Count:
  few large islands: 125ms -> 30ms
  many small islands: 47ms -> 10ms

Diff Detail

Repository
rB Blender
Branch
atomic-disjoint-set (branched from master)
Build Status
Buildable 24867
Build 24867: arc lint + arc unit

Event Timeline

Jacques Lucke (JacquesLucke) requested review of this revision.Nov 30 2022, 1:10 PM
Jacques Lucke (JacquesLucke) created this revision.

I can get 4x speedup for most tests. Sometimes it can be 2x, then it depends on the specifics of the topology and in most cases, it seems, then mainly 4-5x.
And also, I checked it with the old nodes, the new variant keeps the indexing of the roots, so the result is completely identical.

1diff --git a/source/blender/blenlib/BLI_atomic_disjoint_set.hh b/source/blender/blenlib/BLI_atomic_disjoint_set.hh
2new file mode 100644
3index 00000000000..df44225cecd
4--- /dev/null
5+++ b/source/blender/blenlib/BLI_atomic_disjoint_set.hh
6@@ -0,0 +1,142 @@
7+/* SPDX-License-Identifier: GPL-2.0-or-later */
8+
9+#pragma once
10+
11+#include <atomic>
12+
13+#include "BLI_array.hh"
14+#include "BLI_task.hh"
15+
16+namespace blender {
17+
18+/**
19+ * Same as `DisjointSet` but is thread safe (at slightly higher cost for the single threaded case).
20+ *
21+ * The implementation is based on the following paper:
22+ * "Wait-free Parallel Algorithms for the Union-Find Problem"
23+ * by Richard J. Anderson and Heather Woll.
24+ *
25+ * It's also inspired by this implementation: https://github.com/wjakob/dset.
26+ */
27+class AtomicDisjointSet {
28+ private:
29+ /* Can generally used relaxed memory order with this algorithm. */
30+ static constexpr auto relaxed = std::memory_order_relaxed;
31+
32+ struct Item {
33+ int parent;
34+ int rank;
35+ };
36+
37+ /**
38+ * An #Item per element. It's important that the entire item is in a single atomic, so that it
39+ * can be updated atomically. */
40+ Array<std::atomic<Item>> items_;
41+
42+ public:
43+ /**
44+ * Create a new disjoing set with the given set. Initially, every element is in a separate set.
45+ */
46+ AtomicDisjointSet(const int size) : items_(size)
47+ {
48+ threading::parallel_for(IndexRange(size), 4096, [&](const IndexRange range) {
49+ for (const int i : range) {
50+ items_[i].store(Item{i, 0}, relaxed);
51+ }
52+ });
53+ }
54+
55+ /**
56+ * Join the sets containing elements x and y. Nothing happens when they were in the same set
57+ * before.
58+ */
59+ void join(int x, int y)
60+ {
61+ while (true) {
62+ x = this->find_root(x);
63+ y = this->find_root(y);
64+
65+ if (x == y) {
66+ /* They are in the same set already. */
67+ return;
68+ }
69+
70+ Item x_item = items_[x].load(relaxed);
71+ Item y_item = items_[y].load(relaxed);
72+
73+ if (
74+ /* Implement union by rank heuristic. */
75+ x_item.rank > y_item.rank
76+ /* If the rank is the same, make a consistent decision. */
77+ || (x_item.rank == y_item.rank && x < y)) {
78+ std::swap(x_item, y_item);
79+ std::swap(x, y);
80+ }
81+
82+ /* Update parent of item x. */
83+ const Item x_item_new{y, x_item.rank};
84+ if (!items_[x].compare_exchange_strong(x_item, x_item_new, relaxed)) {
85+ /* Another thread has updated item x, start again. */
86+ continue;
87+ }
88+
89+ if (x_item.rank == y_item.rank) {
90+ /* Increase rank of item y. This may fail when another thread has updated item y in the
91+ * meantime. That may lead to worse behavior with the union by rank heurist, but seems to
92+ * be ok in practice. */
93+ const Item y_item_new{y, y_item.rank + 1};
94+ items_[y].compare_exchange_weak(y_item, y_item_new, relaxed);
95+ }
96+ }
97+ }
98+
99+ /**
100+ * Return true when x and y are in the same set.
101+ */
102+ bool in_same_set(int x, int y)
103+ {
104+ while (true) {
105+ x = this->find_root(x);
106+ y = this->find_root(y);
107+ if (x == y) {
108+ return true;
109+ }
110+ if (items_[x].load(relaxed).parent == x) {
111+ return false;
112+ }
113+ }
114+ }
115+
116+ /**
117+ * Find the element that represents the set containing x currently.
118+ */
119+ int find_root(int x)
120+ {
121+ while (true) {
122+ const Item item = items_[x].load(relaxed);
123+ if (x == item.parent) {
124+ return x;
125+ }
126+ const int new_parent = items_[item.parent].load(relaxed).parent;
127+ if (item.parent != new_parent) {
128+ /* This halves the path for faster future lookups. That fail but that does not change
129+ * correctness. */
130+ Item expected = item;
131+ const Item desired{new_parent, item.rank};
132+ items_[x].compare_exchange_weak(expected, desired, relaxed);
133+ }
134+ x = new_parent;
135+ }
136+ }
137+
138+ /**
139+ * True when x represents a set.
140+ */
141+ bool is_root(const int x)
142+ {
143+ const Item item = items_[x].load(relaxed);
144+ return item.parent == x;
145+ }
146+};
147+
148+} // namespace blender
149diff --git a/source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc b/source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc
150index d658b14092a..ec32b7c1951 100644
151--- a/source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc
152+++ b/source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc
153@@ -1,138 +1,322 @@
154 /* SPDX-License-Identifier: GPL-2.0-or-later */
155
156 #include "DNA_mesh_types.h"
157 #include "DNA_meshdata_types.h"
158
159 #include "BKE_mesh.h"
160
161 #include "BLI_disjoint_set.hh"
162
163+#include "BLI_atomic_disjoint_set.hh"
164+#include "BLI_enumerable_thread_specific.hh"
165+#include "BLI_sort.hh"
166+
167 #include "node_geometry_util.hh"
168
169 namespace blender::nodes::node_geo_input_mesh_island_cc {
170
171 static void node_declare(NodeDeclarationBuilder &b)
172 {
173+ b.add_input<decl::Bool>(N_("New"));
174+
175 b.add_output<decl::Int>(N_("Island Index"))
176 .field_source()
177 .description(N_("The index of the each vertex's island. Indices are based on the "
178 "lowest vertex index contained in each island"));
179 b.add_output<decl::Int>(N_("Island Count"))
180 .field_source()
181 .description(N_("The total number of mesh islands"));
182 }
183
184+namespace new_{
185+
186+class IslandFieldInput final : public bke::MeshFieldInput {
187+ public:
188+ IslandFieldInput() : bke::MeshFieldInput(CPPType::get<int>(), "Island Index")
189+ {
190+ category_ = Category::Generated;
191+ }
192+
193+ GVArray get_varray_for_context(const Mesh &mesh,
194+ const eAttrDomain domain,
195+ const IndexMask /*mask*/) const final
196+ {
197+
198+ const Span<MEdge> edges = mesh.edges();
199+
200+ /* Join vertices linked by edges. */
201+ AtomicDisjointSet islands(mesh.totvert);
202+ threading::parallel_for(edges.index_range(), 1024, [&](const IndexRange range) {
203+ for (const MEdge &edge : edges.slice(range)) {
204+ islands.join(edge.v1, edge.v2);
205+ }
206+ });
207+
208+ auto update_first_occurence = [](Map<int, int> &map, const int root, const int index) {
209+ map.add_or_modify(
210+ root,
211+ [&](int *first_occurence) { *first_occurence = index; },
212+ [&](int *first_occurence) {
213+ if (index < *first_occurence) {
214+ *first_occurence = index;
215+ }
216+ });
217+ };
218+
219+ /* Find the root for every vertex. With multi-threading, this root is not deterministic. So
220+ * some postprocessing has to be done to make it deterministic. */
221+ Array<int> output(mesh.totvert);
222+ threading::EnumerableThreadSpecific<Map<int, int>> first_occurence_by_root_per_thread;
223+ threading::parallel_for(IndexRange(mesh.totvert), 1024, [&](const IndexRange range) {
224+ Map<int, int> &first_occurence_by_root = first_occurence_by_root_per_thread.local();
225+ for (const int i : range) {
226+ const int root = islands.find_root(i);
227+ output[i] = root;
228+ update_first_occurence(first_occurence_by_root, root, i);
229+ }
230+ });
231+
232+ /* Build a map that contains the first element index that has a certain root. */
233+ Map<int, int> &combined_map = first_occurence_by_root_per_thread.local();
234+ for (const Map<int, int> &other_map : first_occurence_by_root_per_thread) {
235+ if (&combined_map == &other_map) {
236+ continue;
237+ }
238+ for (const auto item : other_map.items()) {
239+ update_first_occurence(combined_map, item.key, item.value);
240+ }
241+ }
242+
243+ struct Item {
244+ int root;
245+ int first_occurence;
246+ };
247+
248+ /* Sort roots by first occurence. This removes the non-determinism above. */
249+ Vector<Item, 16> items;
250+ items.reserve(combined_map.size());
251+ for (const auto item : combined_map.items()) {
252+ items.append({item.key, item.value});
253+ }
254+ parallel_sort(items.begin(), items.end(), [](const Item &a, const Item &b) {
255+ return a.first_occurence < b.first_occurence;
256+ });
257+
258+ /* Remap original root values with deterministic values. */
259+ Map<int, int> root_map;
260+ root_map.reserve(items.size());
261+ for (const int i : items.index_range()) {
262+ root_map.add_new(items[i].root, i);
263+ }
264+ threading::parallel_for(IndexRange(mesh.totvert), 1024, [&](const IndexRange range) {
265+ for (const int i : range) {
266+ output[i] = root_map.lookup(output[i]);
267+ }
268+ });
269+
270+ return mesh.attributes().adapt_domain<int>(
271+ VArray<int>::ForContainer(std::move(output)), ATTR_DOMAIN_POINT, domain);
272+ }
273+
274+ uint64_t hash() const override
275+ {
276+ /* Some random constant hash. */
277+ return 635467354;
278+ }
279+
280+ bool is_equal_to(const fn::FieldNode &other) const override
281+ {
282+ return dynamic_cast<const IslandFieldInput *>(&other) != nullptr;
283+ }
284+
285+ std::optional<eAttrDomain> preferred_domain(const Mesh & /*mesh*/) const override
286+ {
287+ return ATTR_DOMAIN_POINT;
288+ }
289+};
290+
291+class IslandCountFieldInput final : public bke::MeshFieldInput {
292+ public:
293+ IslandCountFieldInput() : bke::MeshFieldInput(CPPType::get<int>(), "Island Count")
294+ {
295+ category_ = Category::Generated;
296+ }
297+
298+ GVArray get_varray_for_context(const Mesh &mesh,
299+ const eAttrDomain domain,
300+ const IndexMask /*mask*/) const final
301+ {
302+ const Span<MEdge> edges = mesh.edges();
303+
304+ AtomicDisjointSet islands(mesh.totvert);
305+ threading::parallel_for(edges.index_range(), 1024, [&](const IndexRange range) {
306+ for (const MEdge &edge : edges.slice(range)) {
307+ islands.join(edge.v1, edge.v2);
308+ }
309+ });
310+
311+ const int num_islands = threading::parallel_reduce<int>(
312+ IndexRange(mesh.totvert),
313+ 1024,
314+ 0,
315+ [&](const IndexRange range, int count) {
316+ for (const int i : range) {
317+ if (islands.is_root(i)) {
318+ count++;
319+ }
320+ }
321+ return count;
322+ },
323+ [](const int a, const int b) { return a + b; });
324+
325+ return VArray<int>::ForSingle(num_islands, mesh.attributes().domain_size(domain));
326+ }
327+
328+ uint64_t hash() const override
329+ {
330+ /* Some random hash. */
331+ return 45634572457;
332+ }
333+
334+ bool is_equal_to(const fn::FieldNode &other) const override
335+ {
336+ return dynamic_cast<const IslandCountFieldInput *>(&other) != nullptr;
337+ }
338+
339+ std::optional<eAttrDomain> preferred_domain(const Mesh & /*mesh*/) const override
340+ {
341+ return ATTR_DOMAIN_POINT;
342+ }
343+};
344+
345+}
346+
347+namespace old_ {
348+
349 class IslandFieldInput final : public bke::MeshFieldInput {
350 public:
351 IslandFieldInput() : bke::MeshFieldInput(CPPType::get<int>(), "Island Index")
352 {
353 category_ = Category::Generated;
354 }
355
356 GVArray get_varray_for_context(const Mesh &mesh,
357 const eAttrDomain domain,
358 const IndexMask /*mask*/) const final
359 {
360 const Span<MEdge> edges = mesh.edges();
361
362 DisjointSet<int> islands(mesh.totvert);
363 for (const int i : edges.index_range()) {
364 islands.join(edges[i].v1, edges[i].v2);
365 }
366
367 Array<int> output(mesh.totvert);
368 VectorSet<int> ordered_roots;
369 for (const int i : IndexRange(mesh.totvert)) {
370 const int root = islands.find_root(i);
371 output[i] = ordered_roots.index_of_or_add(root);
372 }
373
374 return mesh.attributes().adapt_domain<int>(
375 VArray<int>::ForContainer(std::move(output)), ATTR_DOMAIN_POINT, domain);
376 }
377
378 uint64_t hash() const override
379 {
380 /* Some random constant hash. */
381 return 635467354;
382 }
383
384 bool is_equal_to(const fn::FieldNode &other) const override
385 {
386 return dynamic_cast<const IslandFieldInput *>(&other) != nullptr;
387 }
388
389 std::optional<eAttrDomain> preferred_domain(const Mesh & /*mesh*/) const override
390 {
391 return ATTR_DOMAIN_POINT;
392 }
393 };
394
395 class IslandCountFieldInput final : public bke::MeshFieldInput {
396 public:
397 IslandCountFieldInput() : bke::MeshFieldInput(CPPType::get<int>(), "Island Count")
398 {
399 category_ = Category::Generated;
400 }
401
402 GVArray get_varray_for_context(const Mesh &mesh,
403 const eAttrDomain domain,
404 const IndexMask /*mask*/) const final
405 {
406 const Span<MEdge> edges = mesh.edges();
407
408 DisjointSet<int> islands(mesh.totvert);
409 for (const int i : edges.index_range()) {
410 islands.join(edges[i].v1, edges[i].v2);
411 }
412
413 Set<int> island_list;
414 for (const int i_vert : IndexRange(mesh.totvert)) {
415 const int root = islands.find_root(i_vert);
416 island_list.add(root);
417 }
418
419 return VArray<int>::ForSingle(island_list.size(), mesh.attributes().domain_size(domain));
420 }
421
422 uint64_t hash() const override
423 {
424 /* Some random hash. */
425 return 45634572457;
426 }
427
428 bool is_equal_to(const fn::FieldNode &other) const override
429 {
430 return dynamic_cast<const IslandCountFieldInput *>(&other) != nullptr;
431 }
432
433 std::optional<eAttrDomain> preferred_domain(const Mesh & /*mesh*/) const override
434 {
435 return ATTR_DOMAIN_POINT;
436 }
437 };
438
439+}
440+
441 static void node_geo_exec(GeoNodeExecParams params)
442 {
443- if (params.output_is_required("Island Index")) {
444- Field<int> field{std::make_shared<IslandFieldInput>()};
445- params.set_output("Island Index", std::move(field));
446- }
447- if (params.output_is_required("Island Count")) {
448- Field<int> field{std::make_shared<IslandCountFieldInput>()};
449- params.set_output("Island Count", std::move(field));
450+ if (params.get_input<bool>("New")){
451+ if (params.output_is_required("Island Index")) {
452+ Field<int> field{std::make_shared<new_::IslandFieldInput>()};
453+ params.set_output("Island Index", std::move(field));
454+ }
455+ if (params.output_is_required("Island Count")) {
456+ Field<int> field{std::make_shared<new_::IslandCountFieldInput>()};
457+ params.set_output("Island Count", std::move(field));
458+ }
459+ }else{
460+ if (params.output_is_required("Island Index")) {
461+ Field<int> field{std::make_shared<old_::IslandFieldInput>()};
462+ params.set_output("Island Index", std::move(field));
463+ }
464+ if (params.output_is_required("Island Count")) {
465+ Field<int> field{std::make_shared<old_::IslandCountFieldInput>()};
466+ params.set_output("Island Count", std::move(field));
467+ }
468 }
469 }
470
471 } // namespace blender::nodes::node_geo_input_mesh_island_cc
472
473 void register_node_type_geo_input_mesh_island()
474 {
475 namespace file_ns = blender::nodes::node_geo_input_mesh_island_cc;
476
477 static bNodeType ntype;
478 geo_node_type_base(&ntype, GEO_NODE_INPUT_MESH_ISLAND, "Mesh Island", NODE_CLASS_INPUT);
479 ntype.declare = file_ns::node_declare;
480 ntype.geometry_node_execute = file_ns::node_geo_exec;
481 nodeRegisterType(&ntype);
482 }

  • Merge branch 'master' into atomic-disjoint-set
  • cleanup

How about BLI_disjoint_set_atomic.hh to improve file organization a bit? Or maybe it's better to group this with other "atomic" files. Or maybe that doesn't really matter!

This looks good in general. I left a few smaller comments and one idea inline, but I don't think this needs another review pass. Looking forward to seeing this combined with D16647 :)

source/blender/blenlib/BLI_atomic_disjoint_set.hh
138

I'd propose calc_reduced_ids since this actually does a fair amount of calculation

source/blender/blenlib/intern/atomic_disjoint_set.cc
61–64

It's a bit confusing that this uses the same struct name as the Item struct in the header

72–81

Not totally sure, but it seems like the sorting and root map creation could be combined if this sorted a separate array of indices based on the first occurrence values in the items vector.
That might also make the final parallel loop faster too.

source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc
90

num_islands -> islands_num for proper consistency (though I don't think this is written in the style guide yet)

This revision is now accepted and ready to land.Dec 1 2022, 7:50 PM
  • Merge branch 'master' into atomic-disjoint-set
  • cleanup