This commit makes it so casting subsurface rays will totally ignore all
the BVH nodes and primitives which do not belong to a current object,
making it much simpler traversal code and reduces number of intersection
tests.
Details
- Reviewers
Brecht Van Lommel (brecht) Thomas Dinges (dingto) Lukas Stockner (lukasstockner97) Martijn Berger (juicyfruit) - Commits
- rBS0e47e0cc9e9b: Cycles: Use dedicated BVH for subsurface ray casting
rCf1f85f6e0a44: Use dedicated BVH for subsurface ray casting
rB0e47e0cc9e9b: Cycles: Use dedicated BVH for subsurface ray casting
Diff Detail
- Repository
- rB Blender
Event Timeline
Great, I always wanted to try this. I guess there is a bit of a performance trade-off here, subsurface rays will get faster, while other rays might get a little slower due to a less optimal BVH?
Would be good to confirm there is a speedup on a production character with hair.
| intern/cycles/kernel/geom/geom_bvh_subsurface.h | ||
|---|---|---|
| 67–76 | These push calls don't seem to have a corresponding pop? | |
It is a memory usage and BVH build time trade-off, since non-instanced subsurf object will have two copies of BVH -- one in the scene level and another a "standalone" BVH. Other than that there should be no performance regressions: non-subsurface rays should behave exactly the way they used to before, while subsurface rays should be faster.
| intern/cycles/kernel/geom/geom_bvh_subsurface.h | ||
|---|---|---|
| 67–76 | Well, that's the subsurface intersection discussion we had before strikes back. Intersections are in the object space, so you don't need to adjust intersection distance there. And since we're always within a single instance, we don't need to pop anything during traversal and we don't care of pre-calculations state after we've done with traversal. | |
Ah ok, personally I would have chosen to save the memory, by treating any mesh with SSS as if it was instanced.
It would be worth at least doing a performance comparison between the two methods, if the difference is small in practice then we don't need to unnecessarily increase memory usage. Two BVHs can also increase cache misses and slow things down that way, though probably it's still faster.
Other than that the patch looks good to me.
| intern/cycles/kernel/geom/geom_bvh_subsurface.h | ||
|---|---|---|
| 67–76 | Oh right, it's coming back to me now. | |
@Brecht Van Lommel (brecht), it's a bit arguably actually. For the production scenes using a bit more memory and have faster rendering is more desirable IMO. There are couple of tricks which we can consider here:
- Support half-float storage for UVs and Normals, then extra BVH for handful of objects wouldn't be measurable. Somewhat i find preferable, since then there'll be no points of slowdown (unless i'm missing something).
- Enable scene-level spatial split, so camera ray traversal will not be doomed to do full SSS BVH traversal if SSS object is intersected with something else.
But surely it's all to be benchmarked still and maybe we can commit this patch (or similar one without duplicated BVH) soon. Will try to do remote tests on the Intel beast in the upcoming days.
Hi, made some tests with patch on f4e1c1d_
Master Patched All 100 Samples Intel i5 3570K
Koro
CPU 01:59.00 01:57.00
GPU 01:46.00 01:46.00 < Hairy problem on two GPU GTX760/670
Splash 02:04.00 01:56.00
Fishy 02:47.00 02:47.00
Fishy
500 S. 12:27.00 12:27.00
Classroom
GPU 80% Dim.
300 S. 07:37.37 07_38.00
CPU 11:30.00 11:12.00
Laundromat
(Border on Frank)
CPU
200 S. 01:58.00 02:02.00
CPU
(Full scene 50%)
200 S. 12:25 12:33.00
Hrm, breaking does not work, add .txt file:
Treating BSSRDF objects as instances now.
It is all localized in a single place now, easy to tweak and experiment.
Let's go with conservative memory approach as an initial merge to master.
Can always tweak stuff further once needed.
P.S. Was thinking, once spatial splits are enabled for scene level BVH
it shouldn't matter that much to consider BSSRDF objects as instances,
theit AABBs will be split in a complex instances intersection scenario
making it advantage again of having early ray output.
Looks good to me. It's always great when the code can be simplified and improved at the same time.