Page MenuHome

BLI: Add atomic disjoint set data structure.
ClosedPublic

Authored by Jacques Lucke (JacquesLucke) on Nov 30 2022, 1:10 PM.

Details

Summary

The existing DisjointSet data structure only supports single threaded access, which limits performance severely in some cases.

This patch implements AtomicDisjointSet based on "Wait-free Parallel Algorithms for the Union-Find Problem" by Richard J. Anderson and Heather Woll (paper). I was also inspired by an existing implementation: https://github.com/wjakob/dset/blob/master/dset.h.

I updated the Mesh Islands node to make use of the new data structure. The performance is improvement is quite good:

Island Index:
  few large islands: 138ms -> 38ms
  many small islands: 53ms -> 25ms 
Island Count:
  few large islands: 125ms -> 30ms
  many small islands: 47ms -> 10ms

Diff Detail

Repository
rB Blender
Branch
atomic-disjoint-set (branched from master)
Build Status
Buildable 24865
Build 24865: arc lint + arc unit

Event Timeline

Jacques Lucke (JacquesLucke) requested review of this revision.Nov 30 2022, 1:10 PM
Jacques Lucke (JacquesLucke) created this revision.

I can get 4x speedup for most tests. Sometimes it can be 2x, then it depends on the specifics of the topology and in most cases, it seems, then mainly 4-5x.
And also, I checked it with the old nodes, the new variant keeps the indexing of the roots, so the result is completely identical.

1diff --git a/source/blender/blenlib/BLI_atomic_disjoint_set.hh b/source/blender/blenlib/BLI_atomic_disjoint_set.hh
2new file mode 100644
3index 00000000000..df44225cecd
4--- /dev/null
5+++ b/source/blender/blenlib/BLI_atomic_disjoint_set.hh
6@@ -0,0 +1,142 @@
7+/* SPDX-License-Identifier: GPL-2.0-or-later */
8+
9+#pragma once
10+
11+#include <atomic>
12+
13+#include "BLI_array.hh"
14+#include "BLI_task.hh"
15+
16+namespace blender {
17+
18+/**
19+ * Same as `DisjointSet` but is thread safe (at slightly higher cost for the single threaded case).
20+ *
21+ * The implementation is based on the following paper:
22+ * "Wait-free Parallel Algorithms for the Union-Find Problem"
23+ * by Richard J. Anderson and Heather Woll.
24+ *
25+ * It's also inspired by this implementation: https://github.com/wjakob/dset.
26+ */
27+class AtomicDisjointSet {
28+ private:
29+ /* Can generally used relaxed memory order with this algorithm. */
30+ static constexpr auto relaxed = std::memory_order_relaxed;
31+
32+ struct Item {
33+ int parent;
34+ int rank;
35+ };
36+
37+ /**
38+ * An #Item per element. It's important that the entire item is in a single atomic, so that it
39+ * can be updated atomically. */
40+ Array<std::atomic<Item>> items_;
41+
42+ public:
43+ /**
44+ * Create a new disjoing set with the given set. Initially, every element is in a separate set.
45+ */
46+ AtomicDisjointSet(const int size) : items_(size)
47+ {
48+ threading::parallel_for(IndexRange(size), 4096, [&](const IndexRange range) {
49+ for (const int i : range) {
50+ items_[i].store(Item{i, 0}, relaxed);
51+ }
52+ });
53+ }
54+
55+ /**
56+ * Join the sets containing elements x and y. Nothing happens when they were in the same set
57+ * before.
58+ */
59+ void join(int x, int y)
60+ {
61+ while (true) {
62+ x = this->find_root(x);
63+ y = this->find_root(y);
64+
65+ if (x == y) {
66+ /* They are in the same set already. */
67+ return;
68+ }
69+
70+ Item x_item = items_[x].load(relaxed);
71+ Item y_item = items_[y].load(relaxed);
72+
73+ if (
74+ /* Implement union by rank heuristic. */
75+ x_item.rank > y_item.rank
76+ /* If the rank is the same, make a consistent decision. */
77+ || (x_item.rank == y_item.rank && x < y)) {
78+ std::swap(x_item, y_item);
79+ std::swap(x, y);
80+ }
81+
82+ /* Update parent of item x. */
83+ const Item x_item_new{y, x_item.rank};
84+ if (!items_[x].compare_exchange_strong(x_item, x_item_new, relaxed)) {
85+ /* Another thread has updated item x, start again. */
86+ continue;
87+ }
88+
89+ if (x_item.rank == y_item.rank) {
90+ /* Increase rank of item y. This may fail when another thread has updated item y in the
91+ * meantime. That may lead to worse behavior with the union by rank heurist, but seems to
92+ * be ok in practice. */
93+ const Item y_item_new{y, y_item.rank + 1};
94+ items_[y].compare_exchange_weak(y_item, y_item_new, relaxed);
95+ }
96+ }
97+ }
98+
99+ /**
100+ * Return true when x and y are in the same set.
101+ */
102+ bool in_same_set(int x, int y)
103+ {
104+ while (true) {
105+ x = this->find_root(x);
106+ y = this->find_root(y);
107+ if (x == y) {
108+ return true;
109+ }
110+ if (items_[x].load(relaxed).parent == x) {
111+ return false;
112+ }
113+ }
114+ }
115+
116+ /**
117+ * Find the element that represents the set containing x currently.
118+ */
119+ int find_root(int x)
120+ {
121+ while (true) {
122+ const Item item = items_[x].load(relaxed);
123+ if (x == item.parent) {
124+ return x;
125+ }
126+ const int new_parent = items_[item.parent].load(relaxed).parent;
127+ if (item.parent != new_parent) {
128+ /* This halves the path for faster future lookups. That fail but that does not change
129+ * correctness. */
130+ Item expected = item;
131+ const Item desired{new_parent, item.rank};
132+ items_[x].compare_exchange_weak(expected, desired, relaxed);
133+ }
134+ x = new_parent;
135+ }
136+ }
137+
138+ /**
139+ * True when x represents a set.
140+ */
141+ bool is_root(const int x)
142+ {
143+ const Item item = items_[x].load(relaxed);
144+ return item.parent == x;
145+ }
146+};
147+
148+} // namespace blender
149diff --git a/source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc b/source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc
150index d658b14092a..ec32b7c1951 100644
151--- a/source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc
152+++ b/source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc
153@@ -1,138 +1,322 @@
154 /* SPDX-License-Identifier: GPL-2.0-or-later */
155
156 #include "DNA_mesh_types.h"
157 #include "DNA_meshdata_types.h"
158
159 #include "BKE_mesh.h"
160
161 #include "BLI_disjoint_set.hh"
162
163+#include "BLI_atomic_disjoint_set.hh"
164+#include "BLI_enumerable_thread_specific.hh"
165+#include "BLI_sort.hh"
166+
167 #include "node_geometry_util.hh"
168
169 namespace blender::nodes::node_geo_input_mesh_island_cc {
170
171 static void node_declare(NodeDeclarationBuilder &b)
172 {
173+ b.add_input<decl::Bool>(N_("New"));
174+
175 b.add_output<decl::Int>(N_("Island Index"))
176 .field_source()
177 .description(N_("The index of the each vertex's island. Indices are based on the "
178 "lowest vertex index contained in each island"));
179 b.add_output<decl::Int>(N_("Island Count"))
180 .field_source()
181 .description(N_("The total number of mesh islands"));
182 }
183
184+namespace new_{
185+
186+class IslandFieldInput final : public bke::MeshFieldInput {
187+ public:
188+ IslandFieldInput() : bke::MeshFieldInput(CPPType::get<int>(), "Island Index")
189+ {
190+ category_ = Category::Generated;
191+ }
192+
193+ GVArray get_varray_for_context(const Mesh &mesh,
194+ const eAttrDomain domain,
195+ const IndexMask /*mask*/) const final
196+ {
197+
198+ const Span<MEdge> edges = mesh.edges();
199+
200+ /* Join vertices linked by edges. */
201+ AtomicDisjointSet islands(mesh.totvert);
202+ threading::parallel_for(edges.index_range(), 1024, [&](const IndexRange range) {
203+ for (const MEdge &edge : edges.slice(range)) {
204+ islands.join(edge.v1, edge.v2);
205+ }
206+ });
207+
208+ auto update_first_occurence = [](Map<int, int> &map, const int root, const int index) {
209+ map.add_or_modify(
210+ root,
211+ [&](int *first_occurence) { *first_occurence = index; },
212+ [&](int *first_occurence) {
213+ if (index < *first_occurence) {
214+ *first_occurence = index;
215+ }
216+ });
217+ };
218+
219+ /* Find the root for every vertex. With multi-threading, this root is not deterministic. So
220+ * some postprocessing has to be done to make it deterministic. */
221+ Array<int> output(mesh.totvert);
222+ threading::EnumerableThreadSpecific<Map<int, int>> first_occurence_by_root_per_thread;
223+ threading::parallel_for(IndexRange(mesh.totvert), 1024, [&](const IndexRange range) {
224+ Map<int, int> &first_occurence_by_root = first_occurence_by_root_per_thread.local();
225+ for (const int i : range) {
226+ const int root = islands.find_root(i);
227+ output[i] = root;
228+ update_first_occurence(first_occurence_by_root, root, i);
229+ }
230+ });
231+
232+ /* Build a map that contains the first element index that has a certain root. */
233+ Map<int, int> &combined_map = first_occurence_by_root_per_thread.local();
234+ for (const Map<int, int> &other_map : first_occurence_by_root_per_thread) {
235+ if (&combined_map == &other_map) {
236+ continue;
237+ }
238+ for (const auto item : other_map.items()) {
239+ update_first_occurence(combined_map, item.key, item.value);
240+ }
241+ }
242+
243+ struct Item {
244+ int root;
245+ int first_occurence;
246+ };
247+
248+ /* Sort roots by first occurence. This removes the non-determinism above. */
249+ Vector<Item, 16> items;
250+ items.reserve(combined_map.size());
251+ for (const auto item : combined_map.items()) {
252+ items.append({item.key, item.value});
253+ }
254+ parallel_sort(items.begin(), items.end(), [](const Item &a, const Item &b) {
255+ return a.first_occurence < b.first_occurence;
256+ });
257+
258+ /* Remap original root values with deterministic values. */
259+ Map<int, int> root_map;
260+ root_map.reserve(items.size());
261+ for (const int i : items.index_range()) {
262+ root_map.add_new(items[i].root, i);
263+ }
264+ threading::parallel_for(IndexRange(mesh.totvert), 1024, [&](const IndexRange range) {
265+ for (const int i : range) {
266+ output[i] = root_map.lookup(output[i]);
267+ }
268+ });
269+
270+ return mesh.attributes().adapt_domain<int>(
271+ VArray<int>::ForContainer(std::move(output)), ATTR_DOMAIN_POINT, domain);
272+ }
273+
274+ uint64_t hash() const override
275+ {
276+ /* Some random constant hash. */
277+ return 635467354;
278+ }
279+
280+ bool is_equal_to(const fn::FieldNode &other) const override
281+ {
282+ return dynamic_cast<const IslandFieldInput *>(&other) != nullptr;
283+ }
284+
285+ std::optional<eAttrDomain> preferred_domain(const Mesh & /*mesh*/) const override
286+ {
287+ return ATTR_DOMAIN_POINT;
288+ }
289+};
290+
291+class IslandCountFieldInput final : public bke::MeshFieldInput {
292+ public:
293+ IslandCountFieldInput() : bke::MeshFieldInput(CPPType::get<int>(), "Island Count")
294+ {
295+ category_ = Category::Generated;
296+ }
297+
298+ GVArray get_varray_for_context(const Mesh &mesh,
299+ const eAttrDomain domain,
300+ const IndexMask /*mask*/) const final
301+ {
302+ const Span<MEdge> edges = mesh.edges();
303+
304+ AtomicDisjointSet islands(mesh.totvert);
305+ threading::parallel_for(edges.index_range(), 1024, [&](const IndexRange range) {
306+ for (const MEdge &edge : edges.slice(range)) {
307+ islands.join(edge.v1, edge.v2);
308+ }
309+ });
310+
311+ const int num_islands = threading::parallel_reduce<int>(
312+ IndexRange(mesh.totvert),
313+ 1024,
314+ 0,
315+ [&](const IndexRange range, int count) {
316+ for (const int i : range) {
317+ if (islands.is_root(i)) {
318+ count++;
319+ }
320+ }
321+ return count;
322+ },
323+ [](const int a, const int b) { return a + b; });
324+
325+ return VArray<int>::ForSingle(num_islands, mesh.attributes().domain_size(domain));
326+ }
327+
328+ uint64_t hash() const override
329+ {
330+ /* Some random hash. */
331+ return 45634572457;
332+ }
333+
334+ bool is_equal_to(const fn::FieldNode &other) const override
335+ {
336+ return dynamic_cast<const IslandCountFieldInput *>(&other) != nullptr;
337+ }
338+
339+ std::optional<eAttrDomain> preferred_domain(const Mesh & /*mesh*/) const override
340+ {
341+ return ATTR_DOMAIN_POINT;
342+ }
343+};
344+
345+}
346+
347+namespace old_ {
348+
349 class IslandFieldInput final : public bke::MeshFieldInput {
350 public:
351 IslandFieldInput() : bke::MeshFieldInput(CPPType::get<int>(), "Island Index")
352 {
353 category_ = Category::Generated;
354 }
355
356 GVArray get_varray_for_context(const Mesh &mesh,
357 const eAttrDomain domain,
358 const IndexMask /*mask*/) const final
359 {
360 const Span<MEdge> edges = mesh.edges();
361
362 DisjointSet<int> islands(mesh.totvert);
363 for (const int i : edges.index_range()) {
364 islands.join(edges[i].v1, edges[i].v2);
365 }
366
367 Array<int> output(mesh.totvert);
368 VectorSet<int> ordered_roots;
369 for (const int i : IndexRange(mesh.totvert)) {
370 const int root = islands.find_root(i);
371 output[i] = ordered_roots.index_of_or_add(root);
372 }
373
374 return mesh.attributes().adapt_domain<int>(
375 VArray<int>::ForContainer(std::move(output)), ATTR_DOMAIN_POINT, domain);
376 }
377
378 uint64_t hash() const override
379 {
380 /* Some random constant hash. */
381 return 635467354;
382 }
383
384 bool is_equal_to(const fn::FieldNode &other) const override
385 {
386 return dynamic_cast<const IslandFieldInput *>(&other) != nullptr;
387 }
388
389 std::optional<eAttrDomain> preferred_domain(const Mesh & /*mesh*/) const override
390 {
391 return ATTR_DOMAIN_POINT;
392 }
393 };
394
395 class IslandCountFieldInput final : public bke::MeshFieldInput {
396 public:
397 IslandCountFieldInput() : bke::MeshFieldInput(CPPType::get<int>(), "Island Count")
398 {
399 category_ = Category::Generated;
400 }
401
402 GVArray get_varray_for_context(const Mesh &mesh,
403 const eAttrDomain domain,
404 const IndexMask /*mask*/) const final
405 {
406 const Span<MEdge> edges = mesh.edges();
407
408 DisjointSet<int> islands(mesh.totvert);
409 for (const int i : edges.index_range()) {
410 islands.join(edges[i].v1, edges[i].v2);
411 }
412
413 Set<int> island_list;
414 for (const int i_vert : IndexRange(mesh.totvert)) {
415 const int root = islands.find_root(i_vert);
416 island_list.add(root);
417 }
418
419 return VArray<int>::ForSingle(island_list.size(), mesh.attributes().domain_size(domain));
420 }
421
422 uint64_t hash() const override
423 {
424 /* Some random hash. */
425 return 45634572457;
426 }
427
428 bool is_equal_to(const fn::FieldNode &other) const override
429 {
430 return dynamic_cast<const IslandCountFieldInput *>(&other) != nullptr;
431 }
432
433 std::optional<eAttrDomain> preferred_domain(const Mesh & /*mesh*/) const override
434 {
435 return ATTR_DOMAIN_POINT;
436 }
437 };
438
439+}
440+
441 static void node_geo_exec(GeoNodeExecParams params)
442 {
443- if (params.output_is_required("Island Index")) {
444- Field<int> field{std::make_shared<IslandFieldInput>()};
445- params.set_output("Island Index", std::move(field));
446- }
447- if (params.output_is_required("Island Count")) {
448- Field<int> field{std::make_shared<IslandCountFieldInput>()};
449- params.set_output("Island Count", std::move(field));
450+ if (params.get_input<bool>("New")){
451+ if (params.output_is_required("Island Index")) {
452+ Field<int> field{std::make_shared<new_::IslandFieldInput>()};
453+ params.set_output("Island Index", std::move(field));
454+ }
455+ if (params.output_is_required("Island Count")) {
456+ Field<int> field{std::make_shared<new_::IslandCountFieldInput>()};
457+ params.set_output("Island Count", std::move(field));
458+ }
459+ }else{
460+ if (params.output_is_required("Island Index")) {
461+ Field<int> field{std::make_shared<old_::IslandFieldInput>()};
462+ params.set_output("Island Index", std::move(field));
463+ }
464+ if (params.output_is_required("Island Count")) {
465+ Field<int> field{std::make_shared<old_::IslandCountFieldInput>()};
466+ params.set_output("Island Count", std::move(field));
467+ }
468 }
469 }
470
471 } // namespace blender::nodes::node_geo_input_mesh_island_cc
472
473 void register_node_type_geo_input_mesh_island()
474 {
475 namespace file_ns = blender::nodes::node_geo_input_mesh_island_cc;
476
477 static bNodeType ntype;
478 geo_node_type_base(&ntype, GEO_NODE_INPUT_MESH_ISLAND, "Mesh Island", NODE_CLASS_INPUT);
479 ntype.declare = file_ns::node_declare;
480 ntype.geometry_node_execute = file_ns::node_geo_exec;
481 nodeRegisterType(&ntype);
482 }

  • Merge branch 'master' into atomic-disjoint-set
  • cleanup

How about BLI_disjoint_set_atomic.hh to improve file organization a bit? Or maybe it's better to group this with other "atomic" files. Or maybe that doesn't really matter!

This looks good in general. I left a few smaller comments and one idea inline, but I don't think this needs another review pass. Looking forward to seeing this combined with D16647 :)

source/blender/blenlib/BLI_atomic_disjoint_set.hh
139

I'd propose calc_reduced_ids since this actually does a fair amount of calculation

source/blender/blenlib/intern/atomic_disjoint_set.cc
61–64 ↗(On Diff #58179)

It's a bit confusing that this uses the same struct name as the Item struct in the header

72–81 ↗(On Diff #58179)

Not totally sure, but it seems like the sorting and root map creation could be combined if this sorted a separate array of indices based on the first occurrence values in the items vector.
That might also make the final parallel loop faster too.

source/blender/nodes/geometry/nodes/node_geo_input_mesh_island.cc
144–145

num_islands -> islands_num for proper consistency (though I don't think this is written in the style guide yet)

This revision is now accepted and ready to land.Dec 1 2022, 7:50 PM
  • Merge branch 'master' into atomic-disjoint-set
  • cleanup