Page MenuHome

Cycles: Reduce amount of malloc() calls from the kernel
ClosedPublic

Authored by Sergey Sharybin (sergey) on May 16 2016, 4:24 PM.

Details

Summary

This commit makes it so malloc() is only happening once per volume and
once per transparent shadow query (per thread), improving scalability of
the code to multiple CPU cores.

Hard to measure this with a low-bottom i7 here currently, but from quick
tests seems volume sampling gave about 3% speedup.

The idea is to store allocated memory in kernel globals, which are per
thread on CPU already.

Diff Detail

Repository
rB Blender

Event Timeline

quick test with a scene that has a huge volume around it

master

  1. 05:49.73
  2. 05:48.53

with patch

  1. 05:29.26
  2. 05:30.26

Great speedup!

intern/cycles/kernel/kernel_shadow.h
70–73

We need to reallocate if max_hits is larger than the previously allocated max_hits? Or maybe just allocate with size kernel_data.integrator.transparent_max_bounce immediately.

intern/cycles/kernel/kernel_volume.h
647–653

In this case global_max_steps is fixed, so that should be ok.

For branched path tracing, this relies on the fact that kernel_path_indirect does not use decoupled shading, so we never need two such arrays in memory at once. That's fine, but might be good to add a comment about that.

Sergey Sharybin (sergey) planned changes to this revision.May 16 2016, 6:35 PM
Sergey Sharybin (sergey) added inline comments.
intern/cycles/kernel/kernel_shadow.h
70–73

Uh, the intention was to allocate transparent_max_bounce intersections straight away. Doing re-allocations is more costly that 1024 preallocated hits.

That's a good catch :)

intern/cycles/kernel/kernel_volume.h
647–653

Will do.

intern/cycles/kernel/kernel_volume.h
647–653

Actually I think it will be a problem with branched path tracing. In kernel_path_indirect it deallocates the array before doing the volume bounce, but you still use the array from the first and second bounces at the same time. It's just limited to 2 at the same time max.

intern/cycles/kernel/kernel_volume.h
647–653

Will need to have a closer look (hopefully tomorrow, not entirely sure what's exact first and second bounce you're refferring to. But if it's just two bounces to keep at a time we can store two pointers and bitmask of some sort to see what arrays are free. Should still be cheaper than doing alloc/free for each of integrations.

Gimme some time to go over the corners in details tho..

Sergey Sharybin (sergey) edited edge metadata.

Solve issues reported by Brecht.

Minor code style fix

This revision is now accepted and ready to land.May 17 2016, 9:28 PM
This revision was automatically updated to reflect the committed changes.