Page MenuHome

Cycles: decouple shadow paths from main path on GPU
ClosedPublic

Authored by Brecht Van Lommel (brecht) on Oct 16 2021, 12:10 AM.

Details

Summary

The motivation for this is twofold. We want to see if this improves performance,
and we want to use it to bring back transparency handling for the ambient
occlusion pass, which should be relatively simple with this.

  • Duplicate some members from the main path state in the shadow path state.
  • Add shadow paths incrementally to the array similar to what we do for the shadow catchers.
  • For the scheduling, allow running shade surface and shade volume kernels as long as there is enough space in the shadow paths array. If not, execute shadow kernels until it is empty.
  • Add IntegratorShadowState and ConstIntegratorShadowState typedefs that can be different between CPU and GPU. For GPU both main and shadow paths juse have an integer for SoA access. Bt with CPU it's a different pointer type so we get type safety checks in code shared between CPU and GPU.
  • For CPU, add a separate IntegratorShadowStateCPU struct embedded in IntegratorShadowState.
  • Update various functions to take the shadow state, and make SVM take either type of state using templates.

Shadow path compaction may help here also, but that is for another revision.

Depends on D12888

Diff Detail

Repository
rB Blender
Branch
globals (branched from master)
Build Status
Buildable 17952
Build 17952: arc lint + arc unit

Event Timeline

Brecht Van Lommel (brecht) requested review of this revision.Oct 16 2021, 12:10 AM
Brecht Van Lommel (brecht) created this revision.


It provides a nice little speedup in most benchmark scenes, though of course there is some extra memory usage.

What's remarkable is the big speedup in the spring scene. Note that neither revision here uses transparent shadow baking. It's purely the better occupancy for tracing shadow rays that is helping here.

William Leeson (leesonw) requested changes to this revision.EditedOct 18 2021, 12:58 PM

I tried to get this patch to compile but it would not work using 'arc patch D12889' from D12888 (it failed on the cherry pick :-() then when I tried applying the diff the diff applied but would not compile see P2508 for details. Looking at the code I did not see anything obviously wrong it just applying the patch that caused issues. I did have one question though why do we use LCG that you moved to integrator_shade_surface.h should it not use the specified sampler?

intern/cycles/kernel/integrator/integrator_shade_surface.h
404–410

Why not use the specified sampler?

This revision now requires changes to proceed.Oct 18 2021, 12:58 PM

Building with OSL is indeed broken, I need to do some more refactoring there now that the idea had been validated.

intern/cycles/kernel/integrator/integrator_shade_surface.h
404–410

Not sure what you mean by specified sampler?

If you mean it should use PMJ sampling instead then I agree, this was from a time we only had Sobol and there limited dimensions. But that's a bigger change.

For this change I only moved it because it didn't work with the templated integrator state type, and it's only really needed from this call site.

Looks very interesting and promising! Let me know when there is time for something deeper than general idea and quick code review.

intern/cycles/kernel/integrator/integrator_state_util.h
310–313

Redundant return.

Fix remaining TODOs, except compaction. I'll leave that for another revision.

This should be ready to fully review now.

Had to download the raw patch and apply manually, seemed that arc patch tried to pull some additional changes, or did not apply it on the right commit from master.

Te timing seems a bit different on RTX6000: the Spring is "only" 2.4x faster, while your graph seems to show bigger speedup. Still impressive! :)
Couldn't see anything wrong in the code or in the tests.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 19 2021, 3:30 PM
This revision was automatically updated to reflect the committed changes.