This implements branched path tracking for the split kernel.
Its kind of hard to keep all the different integration loops in sync, so this needs lots of testing to make sure everything is working correctly. We should probably start trying to deduplicate the integration loops more now.
Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes need testing still.