This started out with tweaking the WorkScheduler of the compositor to try different scheduling strategies.
* Fix work packages that ran multiple time due to race condition.
* Keep the relation with parent/child WorkPackages around.
There are two scheduling modes:
* Input to output: mode where input nodes are calculated and after finished their child nodes.
* Output to input: same behavior as master where tiles are prioritized to show something to the user.
There are two different back-ends:
* BLI_task
* pthread queue
We should collect metrics for different platforms to find out the defaults we want.