The idea is to somehow perform clasterization of the shader graph and compile
nodes close to their DFS traversal order in order to minimize SVM stack size.
Currently clasterization is done outside of the render routines (meaning, it's
up to blender synchronization code now to perform clasterization) and SVM code
is only ensuring execution group is only starting to compile once all the group
inputs are done. Additionally, SVM compiler will not switch to another group
for until current one is not fully compiled.
Clasterization is currently happening based on the group node in blender.
This could be improved in the future, but it's not so much bad start.
This now allows to render insane files from T46872, but needs some checks with
actual production files as well.
There are some optimization possible in the code still.