Page MenuHome

Fix T76553: Blender Freezes When Playing Back Animation
ClosedPublic

Authored by Jeroen Bakker (jbakker) on May 11 2020, 10:23 AM.

Details

Summary

In some cases blender could freeze. The reason we want to address with this patch is where the task scheduler performs tasks in the same worker thread that aren't compatible.

For task pools every task is isolated. When using range task the developer can determine if the tasks all together is isolated, or that no isolation is needed. This can be configured with TaskParallelSettings.use_isolation. The implementation is limited as isolatino in TBB uses functors which are tricky to add to a C API. We decided to start with a simple and understandable implementation and adapt were we need to.

During testing we came to this setup as it was reliable (we weren't able to let it freeze or crash) and didn't had noticeable performance impact.

Diff Detail

Repository
rB Blender
Branch
T76553 (branched from master)
Build Status
Buildable 8018
Build 8018: arc lint + arc unit

Event Timeline

Jeroen Bakker (jbakker) requested review of this revision.May 11 2020, 10:23 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)
source/blender/blenlib/intern/task_pool.cc
216

Needs to be adapted as the non suspended pools tasks aren't isolated.

Jeroen Bakker (jbakker) updated this revision to Diff 24610.EditedMay 11 2020, 12:53 PM

When running TASK_ISOLATION_LEVEL_SINGLE_TASK in debug mode I still detect an incorrect insertion in the bvh tree in bvhcache_insert what suggests that someone is writing to the cache without using the cache lock.
Might be that data gets overwritten but this seems very strange.

BLI_assert failed: /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/bvhutils.c:1669, bvhcache_insert(), at 'bvhcache_find(*cache_p, type, &(BVHTree *){0}) == 0'

Thread 29 "b290" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffce1e7700 (LWP 23845)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff786e859 in __GI_abort () at abort.c:79
#2  0x00000000028f546f in bvhcache_insert (cache_p=0x7fff9b856f68, tree=0x7fffa43ae588, type=3)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/bvhutils.c:1669
#3  0x00000000028f43c3 in bvhtree_from_mesh_looptri_ex
    (data=0x7fffce1e4ec0, vert=0x7fff8eafbd48, vert_allocated=false, mloop=0x7fff9af8a548, loop_allocated=false, looptri=0x7fff9573e988, looptri_num=5632, looptri_allocated=false, looptri_mask=0x0, looptri_num_active=-1, epsilon=0, tree_type=4, axis=6, bvh_cache_type=3, bvh_cache=0x7fff9b856f68)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/bvhutils.c:1210
#4  0x00000000028f4cb9 in BKE_bvhtree_from_mesh_get
    (data=0x7fffce1e4ec0, mesh=0x7fff9b856908, bvh_cache_type=3, tree_type=4)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/bvhutils.c:1441
#5  0x000000000269ea4b in BKE_shrinkwrap_init_tree
    (data=0x7fffce1e4eb0, mesh=0x7fff9b856908, shrinkType=3, shrinkMode=4, force_normals=true)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/shrinkwrap.c:134
#6  0x000000000291bcba in shrinkwrap_get_tarmat
    (UNUSED_depsgraph=0x7ffff1b43308, con=0x7fff9b810908, cob=0x7fff8b22ce48, ct=0x7fff8b22cf08, UNUSED_ctime=15)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/constraint.c:4021
#7  0x000000000291fbbb in BKE_constraint_targets_for_solving_get
    (depsgraph=0x7ffff1b43308, con=0x7fff9b810908, cob=0x7fff8b22ce48, targets=0x7fffce1e50e0, ctime=15)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/constraint.c:5780
#8  0x000000000291fd92 in BKE_constraints_solve
    (depsgraph=0x7ffff1b43308, conlist=0x7fffcbb57420, cob=0x7fff8b22ce48, ctime=15)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/constraint.c:5847
#9  0x00000000028e1d70 in BKE_pose_where_is_bone
    (depsgraph=0x7ffff1b43308, scene=0x7ffff7338808, ob=0x7ffff22ff408, pchan=0x7fffcbb57408, ctime=15, do_extra=true)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/armature.c:2897
#10 0x00000000028e4ef6 in BKE_pose_constraints_evaluate
    (depsgraph=0x7ffff1b43308, scene=0x7ffff7338808, object=0x7ffff22ff408, pchan_index=69)
    at /home/jeroen/blender-git/blender/source/blender/blenkernel/intern/armature_update.c:726
#11 0x000000000390ffba in std::__invoke_impl<void, void (*&)(Depsgraph*, Scene*, Object*, int), Depsgraph*, Scene*&, Object*&, int&>(std::__invoke_other, void (*&)(Depsgraph*, Scene*, Object*, int), Depsgraph*&&, Scene*&, Object*&, int&)
    (__f=@0x7ffff73b0300: 0x28e4c46 <BKE_pose_constraints_evaluate>) at /usr/include/c++/9/bits/invoke.h:60
#12 0x000000000390fc04 in std::__invoke<void (*&)(Depsgraph*, Scene*, Object*, int), Depsgraph*, Scene*&, Object*&, int&>(void (*&)(Depsgraph*, Scene*, Object*, int), Depsgraph*&&, Scene*&, Object*&, int&) (__fn=
    @0x7ffff73b0300: 0x28e4c46 <BKE_pose_constraints_evaluate>) at /usr/include/c++/9/bits/invoke.h:95
#13 0x000000000390f729 in std::_Bind<void (*(std::_Placeholder<1>, Scene*, Object*, int))(Depsgraph*, Scene*, Object*, int)>::__call<void, Depsgraph*&&, 0ul, 1ul, 2ul, 3ul>(std::tuple<Depsgraph*&&>&&, std::_Index_tuple<0ul, 1ul, 2ul, 3ul>) (this=0x7ffff73b0300, __args=...) at /usr/include/c++/9/functional:400
#14 0x000000000390f0ec in std::_Bind<void (*(std::_Placeholder<1>, Scene*, Object*, int))(Depsgraph*, Scene*, Object*, int)>::operator()<Depsgraph*, void>(Depsgraph*&&) (this=0x7ffff73b0300) at /usr/include/c++/9/functional:484
#15 0x000000000390ead6 in std::_Function_handler<void (Depsgraph*), std::_Bind<void (*(std::_Placeholder<1>, Scene*, Object*, int))(Depsgraph*, Scene*, Object*, int)> >::_M_invoke(std::_Any_data const&, Depsgraph*&&)
    (__functor=..., __args#0=@0x7fffce1e53e0: 0x7ffff1b43308) at /usr/include/c++/9/bits/std_function.h:300
#16 0x000000000392ead5 in std::function<void (Depsgraph*)>::operator()(Depsgraph*) const
    (this=0x7fff9ec14bc8, __args#0=0x7ffff1b43308) at /usr/include/c++/9/bits/std_function.h:688
#17 0x000000000392db05 in DEG::(anonymous namespace)::evaluate_node(DEG::(anonymous namespace)::DepsgraphEvalState const*, DEG::OperationNode*) (state=0x7fffffffd7a0, operation_node=0x7fff9ec14b08)
    at /home/jeroen/blender-git/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:113
#18 0x000000000392db4f in DEG::(anonymous namespace)::deg_task_run_func(TaskPool*, void*)
    (pool=0x7fff9a379d88, taskdata=0x7fff9ec14b08)
    at /home/jeroen/blender-git/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:124
#19 0x00000000060c85d7 in Task::operator()() const (this=0x7fffcfdf4248)
    at /home/jeroen/blender-git/blender/source/blender/blenlib/intern/task_pool.cc:117
--Type <RET> for more, q to quit, c to continue without paging--c
#20 0x00000000060c8c3e in tbb::internal::function_task<Task>::execute() (this=0x7fffcfdf4240) at /home/jeroen/blender-git/lib/linux_centos7_x86_64/tbb/include/tbb/task.h:1048
#21 0x00000000029ada65 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) ()
#22 0x00000000029af135 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) ()
#23 0x00000000029a0e98 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) ()
#24 0x00000000029a8ce3 in tbb::internal::market::process(rml::job&) ()
#25 0x00000000029aaaf6 in tbb::internal::rml::private_worker::run() ()
#26 0x00000000029aad39 in tbb::internal::rml::private_worker::thread_routine(void*) ()
#27 0x00007ffff7f8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#28 0x00007ffff796b103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

There seems like a failure in the read-write locking mechanism.

I used a more strict version of the bvhutils.c and it fails for me.

1diff --git a/source/blender/blenkernel/intern/bvhutils.c b/source/blender/blenkernel/intern/bvhutils.c
2index 8f6b8bd6980..00994166977 100644
3--- a/source/blender/blenkernel/intern/bvhutils.c
4+++ b/source/blender/blenkernel/intern/bvhutils.c
5@@ -42,6 +42,34 @@
6 #include "MEM_guardedalloc.h"
7
8 static ThreadRWMutex cache_rwlock = BLI_RWLOCK_INITIALIZER;
9+static int cache_lock_state = 0;
10+
11+static void bvhutils_cache_read_lock(const char *name)
12+{
13+ BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
14+ printf("read_lock %s\n", name);
15+ BLI_assert(cache_lock_state != THREAD_LOCK_WRITE);
16+}
17+static void bvhutils_cache_read_unlock(const char *name)
18+{
19+ BLI_assert(cache_lock_state != THREAD_LOCK_WRITE);
20+ printf("read_unlock %s\n", name);
21+ BLI_rw_mutex_unlock(&cache_rwlock);
22+}
23+static void bvhutils_cache_write_lock(const char *name)
24+{
25+ BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_WRITE);
26+ printf("write_lock %s\n", name);
27+ BLI_assert(cache_lock_state == 0);
28+ cache_lock_state = THREAD_LOCK_WRITE;
29+}
30+static void bvhutils_cache_write_unlock(const char *name)
31+{
32+ BLI_assert(cache_lock_state == THREAD_LOCK_WRITE);
33+ cache_lock_state = 0;
34+ printf("write_unlock %s\n", name);
35+ BLI_rw_mutex_unlock(&cache_rwlock);
36+}
37
38 /* -------------------------------------------------------------------- */
39 /** \name Local Callbacks
40@@ -523,12 +551,12 @@ BVHTree *bvhtree_from_editmesh_verts_ex(BVHTreeFromEditMesh *data,
41 BVHTree *tree = NULL;
42
43 if (bvh_cache) {
44- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
45+ bvhutils_cache_read_lock(__func__);
46 data->cached = bvhcache_find(*bvh_cache, bvh_cache_type, &data->tree);
47- BLI_rw_mutex_unlock(&cache_rwlock);
48+ bvhutils_cache_read_unlock(__func__);
49
50 if (data->cached == false) {
51- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_WRITE);
52+ bvhutils_cache_write_lock(__func__);
53 data->cached = bvhcache_find(*bvh_cache, bvh_cache_type, &data->tree);
54 if (data->cached == false) {
55 tree = bvhtree_from_editmesh_verts_create_tree(
56@@ -539,7 +567,7 @@ BVHTree *bvhtree_from_editmesh_verts_ex(BVHTreeFromEditMesh *data,
57 bvhcache_insert(bvh_cache, tree, bvh_cache_type);
58 data->cached = true;
59 }
60- BLI_rw_mutex_unlock(&cache_rwlock);
61+ bvhutils_cache_write_unlock(__func__);
62 }
63 }
64 else {
65@@ -587,14 +615,14 @@ BVHTree *bvhtree_from_mesh_verts_ex(BVHTreeFromMesh *data,
66 bool in_cache = false;
67 BVHTree *tree = NULL;
68 if (bvh_cache) {
69- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
70+ bvhutils_cache_read_lock(__func__);
71 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
72- BLI_rw_mutex_unlock(&cache_rwlock);
73+ bvhutils_cache_read_unlock(__func__);
74 if (in_cache == false) {
75- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_WRITE);
76+ bvhutils_cache_write_lock(__func__);
77 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
78 if (in_cache) {
79- BLI_rw_mutex_unlock(&cache_rwlock);
80+ bvhutils_cache_write_unlock(__func__);
81 }
82 }
83 }
84@@ -607,7 +635,7 @@ BVHTree *bvhtree_from_mesh_verts_ex(BVHTreeFromMesh *data,
85 /* Save on cache for later use */
86 /* printf("BVHTree built and saved on cache\n"); */
87 bvhcache_insert(bvh_cache, tree, bvh_cache_type);
88- BLI_rw_mutex_unlock(&cache_rwlock);
89+ bvhutils_cache_write_unlock(__func__);
90 in_cache = true;
91 }
92 }
93@@ -740,12 +768,12 @@ BVHTree *bvhtree_from_editmesh_edges_ex(BVHTreeFromEditMesh *data,
94 BVHTree *tree = NULL;
95
96 if (bvh_cache) {
97- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
98+ bvhutils_cache_read_lock(__func__);
99 data->cached = bvhcache_find(*bvh_cache, bvh_cache_type, &data->tree);
100- BLI_rw_mutex_unlock(&cache_rwlock);
101+ bvhutils_cache_read_unlock(__func__);
102
103 if (data->cached == false) {
104- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_WRITE);
105+ bvhutils_cache_write_lock(__func__);
106 data->cached = bvhcache_find(*bvh_cache, bvh_cache_type, &data->tree);
107 if (data->cached == false) {
108 tree = bvhtree_from_editmesh_edges_create_tree(
109@@ -756,7 +784,7 @@ BVHTree *bvhtree_from_editmesh_edges_ex(BVHTreeFromEditMesh *data,
110 bvhcache_insert(bvh_cache, tree, bvh_cache_type);
111 data->cached = true;
112 }
113- BLI_rw_mutex_unlock(&cache_rwlock);
114+ bvhutils_cache_write_unlock(__func__);
115 }
116 }
117 else {
118@@ -807,14 +835,14 @@ BVHTree *bvhtree_from_mesh_edges_ex(BVHTreeFromMesh *data,
119 bool in_cache = false;
120 BVHTree *tree = NULL;
121 if (bvh_cache) {
122- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
123+ bvhutils_cache_read_lock(__func__);
124 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
125- BLI_rw_mutex_unlock(&cache_rwlock);
126+ bvhutils_cache_read_unlock(__func__);
127 if (in_cache == false) {
128- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_WRITE);
129+ bvhutils_cache_write_lock(__func__);
130 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
131 if (in_cache) {
132- BLI_rw_mutex_unlock(&cache_rwlock);
133+ bvhutils_cache_write_unlock(__func__);
134 }
135 }
136 }
137@@ -827,7 +855,7 @@ BVHTree *bvhtree_from_mesh_edges_ex(BVHTreeFromMesh *data,
138 /* Save on cache for later use */
139 /* printf("BVHTree built and saved on cache\n"); */
140 bvhcache_insert(bvh_cache, tree, bvh_cache_type);
141- BLI_rw_mutex_unlock(&cache_rwlock);
142+ bvhutils_cache_write_unlock(__func__);
143 in_cache = true;
144 }
145 }
146@@ -942,14 +970,14 @@ BVHTree *bvhtree_from_mesh_faces_ex(BVHTreeFromMesh *data,
147 bool in_cache = false;
148 BVHTree *tree = NULL;
149 if (bvh_cache) {
150- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
151+ bvhutils_cache_read_lock(__func__);
152 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
153- BLI_rw_mutex_unlock(&cache_rwlock);
154+ bvhutils_cache_read_unlock(__func__);
155 if (in_cache == false) {
156- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_WRITE);
157+ bvhutils_cache_write_lock(__func__);
158 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
159 if (in_cache) {
160- BLI_rw_mutex_unlock(&cache_rwlock);
161+ bvhutils_cache_write_unlock(__func__);
162 }
163 }
164 }
165@@ -962,7 +990,7 @@ BVHTree *bvhtree_from_mesh_faces_ex(BVHTreeFromMesh *data,
166 /* Save on cache for later use */
167 /* printf("BVHTree built and saved on cache\n"); */
168 bvhcache_insert(bvh_cache, tree, bvh_cache_type);
169- BLI_rw_mutex_unlock(&cache_rwlock);
170+ bvhutils_cache_write_unlock(__func__);
171 in_cache = true;
172 }
173 }
174@@ -1120,11 +1148,11 @@ BVHTree *bvhtree_from_editmesh_looptri_ex(BVHTreeFromEditMesh *data,
175
176 BVHTree *tree = NULL;
177 if (bvh_cache) {
178- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
179+ bvhutils_cache_read_lock(__func__);
180 bool in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
181- BLI_rw_mutex_unlock(&cache_rwlock);
182+ bvhutils_cache_read_unlock(__func__);
183 if (in_cache == false) {
184- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_WRITE);
185+ bvhutils_cache_write_lock(__func__);
186 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
187 if (in_cache == false) {
188 tree = bvhtree_from_editmesh_looptri_create_tree(
189@@ -1134,7 +1162,7 @@ BVHTree *bvhtree_from_editmesh_looptri_ex(BVHTreeFromEditMesh *data,
190 /* printf("BVHTree built and saved on cache\n"); */
191 bvhcache_insert(bvh_cache, tree, bvh_cache_type);
192 }
193- BLI_rw_mutex_unlock(&cache_rwlock);
194+ bvhutils_cache_write_unlock(__func__);
195 }
196 }
197 else {
198@@ -1182,14 +1210,14 @@ BVHTree *bvhtree_from_mesh_looptri_ex(BVHTreeFromMesh *data,
199 bool in_cache = false;
200 BVHTree *tree = NULL;
201 if (bvh_cache) {
202- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
203+ bvhutils_cache_read_lock(__func__);
204 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
205- BLI_rw_mutex_unlock(&cache_rwlock);
206+ bvhutils_cache_read_unlock(__func__);
207 if (in_cache == false) {
208- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_WRITE);
209+ bvhutils_cache_write_lock(__func__);
210 in_cache = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
211 if (in_cache) {
212- BLI_rw_mutex_unlock(&cache_rwlock);
213+ bvhutils_cache_write_unlock(__func__);
214 }
215 }
216 }
217@@ -1208,7 +1236,7 @@ BVHTree *bvhtree_from_mesh_looptri_ex(BVHTreeFromMesh *data,
218
219 if (bvh_cache) {
220 bvhcache_insert(bvh_cache, tree, bvh_cache_type);
221- BLI_rw_mutex_unlock(&cache_rwlock);
222+ bvhutils_cache_write_unlock(__func__);
223 in_cache = true;
224 }
225 }
226@@ -1317,9 +1345,9 @@ BVHTree *BKE_bvhtree_from_mesh_get(struct BVHTreeFromMesh *data,
227 BVHTree *tree = NULL;
228 BVHCache **bvh_cache = &mesh->runtime.bvh_cache;
229
230- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
231+ bvhutils_cache_read_lock(__func__);
232 bool is_cached = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
233- BLI_rw_mutex_unlock(&cache_rwlock);
234+ bvhutils_cache_read_unlock(__func__);
235
236 if (is_cached && tree == NULL) {
237 memset(data, 0, sizeof(*data));
238@@ -1501,9 +1529,9 @@ BVHTree *BKE_bvhtree_from_editmesh_get(BVHTreeFromEditMesh *data,
239 memset(data, 0, sizeof(*data));
240
241 if (bvh_cache) {
242- BLI_rw_mutex_lock(&cache_rwlock, THREAD_LOCK_READ);
243+ bvhutils_cache_read_lock(__func__);
244 is_cached = bvhcache_find(*bvh_cache, bvh_cache_type, &tree);
245- BLI_rw_mutex_unlock(&cache_rwlock);
246+ bvhutils_cache_read_unlock(__func__);
247
248 if (is_cached && tree == NULL) {
249 return tree;

write_lock bvhtree_from_mesh_looptri_ex
write_unlock bvhtree_from_mesh_looptri_ex
read_lock BKE_bvhtree_from_mesh_get
read_unlock BKE_bvhtree_from_mesh_get
read_lock BKE_bvhtree_from_mesh_get
read_unlock BKE_bvhtree_from_mesh_get
read_lock BKE_bvhtree_from_mesh_get
read_unlock BKE_bvhtree_from_mesh_get
read_lock BKE_bvhtree_from_mesh_get
read_unlock BKE_bvhtree_from_mesh_get
read_lock bvhtree_from_mesh_looptri_ex
read_unlock bvhtree_from_mesh_looptri_ex
read_lock bvhtree_from_mesh_looptri_ex
read_unlock bvhtree_from_mesh_looptri_ex
read_lock bvhtree_from_mesh_looptri_ex
read_unlock bvhtree_from_mesh_looptri_ex
write_lock bvhtree_from_mesh_looptri_ex
read_lock BKE_bvhtree_from_mesh_get

b290(BLI_system_backtrace+0x39) [0x60c6120]
b290() [0x28f02ce]
b290(BKE_bvhtree_from_mesh_get+0x4b) [0x28f4a5b]
b290(BKE_shrinkwrap_init_tree+0xd5) [0x269ea4b]
b290() [0x291bebb]
b290(BKE_constraint_targets_for_solving_get+0xb5) [0x291fdbc]
b290(BKE_constraints_solve+0x18f) [0x291ff93]
b290(BKE_pose_where_is_bone+0x142) [0x28e1d70]
b290(BKE_pose_constraints_evaluate+0x2b0) [0x28e4ef6]
b290(_ZSt13__invoke_implIvRPFvP9DepsgraphP5SceneP6ObjectiEJS1_RS3_RS5_RiEET_St14__invoke_otherOT0_DpOT1_+0x81) [0x39101ba]
b290(_ZSt8__invokeIRPFvP9DepsgraphP5SceneP6ObjectiEJS1_RS3_RS5_RiEENSt15__invoke_resultIT_JDpT0_EE4typeEOSD_DpOSE_+0x92) [0x390fe04]
b290(_ZNSt5_BindIFPFvP9DepsgraphP5SceneP6ObjectiESt12_PlaceholderILi1EES3_S5_iEE6__callIvJOS1_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0xe1) [0x390f929]
b290(_ZNSt5_BindIFPFvP9DepsgraphP5SceneP6ObjectiESt12_PlaceholderILi1EES3_S5_iEEclIJS1_EvEET0_DpOT_+0x54) [0x390f2ec]
b290(_ZNSt17_Function_handlerIFvP9DepsgraphESt5_BindIFPFvS1_P5SceneP6ObjectiESt12_PlaceholderILi1EES5_S7_iEEE9_M_invokeERKSt9_Any_dataOS1_+0x3b) [0x390ecd6]
b290(_ZNKSt8functionIFvP9DepsgraphEEclES1_+0x4d) [0x392ecd5]
b290() [0x392dd05]
b290() [0x392dd4f]
b290(_ZNK4TaskclEv+0x2f) [0x60c87b7]

Added a strict rw_lock check

@Jeroen Bakker (jbakker) the mutex may be broken, but I would find that quite surprising. Note that you can't trust printf to actually print before it hangs, for that you need to use fflush(stdout). I'm also not sure what happens when such locks are called recursively, which is what the bug is here.

I think TASK_ISOLATION_LEVEL_SINGLE_TASK is in the wrong place for the task pool. I would expect it to be in Task::operator() similar to what you did for parallel range. Now there is "isolate -> task pool -> task parallel range -> isolate", so that task pool and parallel range are not isolated from each other. By placing it in Task::operator() it would be "task pool -> isolate -> task parallel range -> isolate".

TASK_ISOLATION_POOL_ENABLED would be basically "isolate -> task pool -> isolate -> task parallel range". But I'm not sure the implementation of that is correct either. It seems to create a new isolation region for every tbb_group.run, but I'd expect that isolation region should be created once per task pool? This might be tricky to implement with C code though.

TASK_ISOLATION_LEVEL_NEVER: deadlocked itself
TASK_ISOLATION_LEVEL_GROUPED: ~9 fps (edit: also managed to deadlock itself)
TASK_ISOLATION_LEVEL_SINGLE_TASK: ~9fps

The fps is just too jumpy to make a call between the last two.

Never ran into the situation where the locks were violated, not even in a release build (i did make sure the assert worked for bvhutils.c)

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)May 11 2020, 4:38 PM
Jeroen Bakker (jbakker) updated this revision to Diff 24619.EditedMay 11 2020, 4:49 PM

Moved single task isolation to call operator

After isolating in the operator() I wasn't able to trigger the assert. It find it also unlikely that the issue is in pthreads I most likely would blame myself. This might or might not get back to us.

@Brecht Van Lommel (brecht) As you mentioned getting pool isolation to work would be tricky. The implementation of TASK_ISOLATION_POOL_ENABLED only works for suspended pools. Other pools is implemented as a task isolation due to this.

@Ray Molenkamp (LazyDodo) I just used a different file (Spring 020_02A.anim.blend) for the performance tests so it wouldn't lock up.

Next steps would be:

  • double check if there is a way to support pool isolation via C without it becoming too much of a suspended pool.
  • get performance feedback from actual workstations. and see what performance drop we would accept.
  • chose an implementation and remove the rest.

It we want to have a quicker fix I would suggest to first commit the Single task isolation as that seems to be stable. In the parallel_for we could add a setting what kind of isolation is desired.

After that we could optimize.

source/blender/blenlib/intern/task_pool.cc
222

Add more documentation that due to API this is tricky to support

It we want to have a quicker fix I would suggest to first commit the Single task isolation as that seems to be stable. In the parallel_for we could add a setting what kind of isolation is desired.

After that we could optimize.

Agreed, let's do the simple single task isolation and leave the rest as an optimization. Using that I also could find no significant performance difference.

I think there are probably cases where we are leaving performance on the table (either due to constant task isolation overhead or suboptimal scheduling), but I'm not super worried about it at this point.

Removed compile directives.

  • task_parallel does isolation over the range. Isolation can be disabled in TaskParallelSettings.use_isolation
  • task pools use isolation per task.
Jeroen Bakker (jbakker) retitled this revision from [WIP] Fix T76553: Blender Freezes When Playing Back Animation to Fix T76553: Blender Freezes When Playing Back Animation.May 12 2020, 3:04 PM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)

reverted unrelated code style changes

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)May 12 2020, 3:09 PM
Brecht Van Lommel (brecht) requested changes to this revision.May 12 2020, 3:29 PM

This seems to be doing task pool -> isolate -> isolate -> task parallel range.

That is redundant, and will not work when you have a task pool inside a parallel range, and get isolate -> task parallel range -> task pool -> isolate.

It needs to happen in operator() for both cases. We could disable isolation for parallel range by default, I can't spot any current usage that needs it.

source/blender/blenlib/BLI_task.h
143–144

Perhaps rename to use_nested_tasks and change description to:

If nested parallelism is used where tasks uses a task pool or parallel range again, this must be enabled to avoid potential deadlocks. Threads are then isolated so that they do not run tasks from the same or higher levels, which can lead to trying to acquire the same mutex multiple time and deadlocking.

This revision now requires changes to proceed.May 12 2020, 3:29 PM

Code review

  • renamed use_isolation to use_nested_tasks
  • only isolate in the task execution functions.
Jeroen Bakker (jbakker) marked an inline comment as done.

Whitespace

source/blender/blenlib/intern/task_range.cc
96–102

Move the isolation outside the for loop to reduce overhead.

source/blender/blenlib/intern/task_range.cc
96–102

You’re probably right. I tested that but performance was better when doing it inside the loop somehow. But could be that it is so on my system

source/blender/blenlib/intern/task_range.cc
96–102

I think it would be either timing noise. Or a case where the scheduler happens to work better if some code runs a bit slower, but not something that would be true in general.

Moved isolation of parallel range out of its inner loop

Jeroen Bakker (jbakker) marked 2 inline comments as done.May 12 2020, 6:23 PM
source/blender/blenlib/BLI_task.h
143–144

I find this naming confusing and misleading: it sounds as if BLI_task_parallel_range will use nested tasks, while in fact what makes nested tasks safe for use.

Another crucial bit of information is omitted: what is the penalty for of this function? If there is no penalty, why not to always assume it is enabled? If there is penalty, should we instead re-write nested parallel algorithms in a way which avoids such nesting?

This revision is now accepted and ready to land.May 14 2020, 12:02 PM
source/blender/blenlib/BLI_task.h
143–144

Looking at the documentation and code of TBB isolation the overhead seems very small (https://github.com/oneapi-src/oneTBB/blob/tbb_2017/src/tbb/scheduler.cpp#L899). A worker that is blocked will handle tasks of the same level or none isolated tasks.

@Brecht Van Lommel (brecht) any objection to remove the use_nested_tasks option and always isolate?

Ok, we can leave out the option. The most important thing now is to get this hang fixed.