Page MenuHome

Cycles: Fix bvh2 gen on Apple Silicon and use it to speed up renders
ClosedPublic

Authored by Michael Jones (michael_jones) on Jan 19 2022, 10:32 PM.
Tags
None
Tokens
"Love" token, awarded by valwal."100" token, awarded by mycoconut."Party Time" token, awarded by Tiwaz."Love" token, awarded by Raimund58.

Details

Summary

This patch fixes a correctness issue discovered in the int4 select(...) function on Apple Silicon machines, which causes bad bvh2 builds. Although the generated bvh2s give correct renders, the resulting runtime performance is terrible. This fix allows us to switch over to bvh2 on Apple Silicon giving a significant performance uplift for many of the standard benchmarking assets. It also fixes some unit test failures stemming from the use of MetalRT, and trivially enables the new pointcloud primitive.

Ref T92212

Diff Detail

Repository
rB Blender

Event Timeline

Michael Jones (michael_jones) requested review of this revision.Jan 19 2022, 10:32 PM
Michael Jones (michael_jones) created this revision.
Ray Molenkamp (LazyDodo) added inline comments.
intern/cycles/util/math_int4.h
134

This will cause build issues for both gcc and msvc, seemingly clang is the only ok with this.

I have to admit, no test coverage at all for int4 is a bit concerning, but from some local testing

return int4(_mm_or_si128(_mm_and_si128(mask, a), _mm_andnot_si128(mask, b)));

seems to do the right thing? and builds on all platforms (according to godbolt)

Christopher Snowhill (kode54) added inline comments.
intern/cycles/device/metal/device_impl.mm
100

This appears to leave MetalRT disabled for all platforms unless the environment variable is set?

Michael Jones (michael_jones) marked an inline comment as done.Jan 20 2022, 10:31 AM
Michael Jones (michael_jones) added inline comments.
intern/cycles/device/metal/device_impl.mm
100

Yes, that's the intention as bvh2 is universally faster.

Michael Jones (michael_jones) marked an inline comment as done.
  • Merge branch 'master' of git.blender.org:blender
  • Use _si128 variants to fix the build, as per Ray Molenkamp's suggestion
Michael Jones (michael_jones) marked an inline comment as done.Jan 20 2022, 11:18 AM
Michael Jones (michael_jones) added inline comments.
intern/cycles/util/math_int4.h
134

I've taken your suggestion - many thanks :)

This revision is now accepted and ready to land.Jan 20 2022, 1:31 PM

@Michael Jones (michael_jones) One question out of curiosity.

Why "max_threads_per_threadgroup = 512;" as Metal GPUFamily Apple 7 support up to 1024?