The idea of this work is to avoid mutex locks on every task acquisition and
replace it with spin lock. The claim here: in normal conditions it'll almost
always will take less than 1 iterations of the loop to get a new task to work
on. That means we'll spin really a bit and if we send thread to sleep that
will run any performance.
This is a patch which attempts to implement this idea, and it gives really
nide speedup in files like the one from T50027 (with all cached physics) for
both old and new depsgraph.
Unfortunately, seems there's a mistake somewhere so some production scenes
here crashes time to time.
There are also some TODOs in the code which i'd like to discuss and have a
brainstorm about.