Page MenuHome

Cycles: Enable inlining on Apple Silicon for 1.1x speedup
AbandonedPublic

Authored by Michael Jones (michael_jones) on Apr 26 2022, 1:22 PM.

Details

Summary

This is a stripped down version of D14645 without the scene specialisation optimisations.

The two major changes in this patch are:

  • Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times
  • Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice)

Diff Detail

Repository
rB Blender
Branch
master
Build Status
Buildable 21822
Build 21822: arc lint + arc unit

Event Timeline

Michael Jones (michael_jones) requested review of this revision.Apr 26 2022, 1:22 PM
Brecht Van Lommel (brecht) requested changes to this revision.Apr 26 2022, 6:15 PM

Looks generally fine.

It would also be good if you could run make format.

intern/cycles/device/metal/kernel.mm
45

Code style: capitalize sentences and end with dot.

82

I think this function needs a mutex lock? In case multiple Cycles sessions execute it at the same time.

112

Would slightly prefer make_unique and then std::move.

Also, add #include "util/unique_ptr.h" and remove the std:: namespace prefix.

272

Is there a reason this one is using std::lock_guard and other places use std::unique_lock.

In most Cycles code we use the types defined in util/thread.h. Unless there is a specific to reason something else, it seems best to use those.

446

This name is missing the GPU architecture, which matters when there are multiple GPUs with different architectures on the device (like an external GPU).

At least I imagine the cache can't be shared between those.

intern/cycles/kernel/device/metal/compat.h
34–37

Code style:

/* Inline everything for Apple GPUs.
 * This gives ~1.1x speedup and 10% spill reduction for integator_shade_surface
 * at the cost of longer compile times (~4.5 minutes on M1 Max). */
This revision now requires changes to proceed.Apr 26 2022, 6:15 PM
  • Running make format
  • Consistent mutex locking
  • Factor device name into cache key
Michael Jones (michael_jones) marked 6 inline comments as done.Apr 26 2022, 6:46 PM

Thanks for the feedback @Brecht Van Lommel (brecht).

intern/cycles/device/metal/kernel.mm
446

Good spot, thanks! I incorporated this into local_md5, although I thinking about it, perhaps it makes more sense to maintain separate caches per device, split by folder. Perhaps something to iterate on in master as we look at caching more?

Michael Jones (michael_jones) marked an inline comment as done.Apr 26 2022, 6:46 PM
Brecht Van Lommel (brecht) requested changes to this revision.Apr 26 2022, 7:32 PM
Brecht Van Lommel (brecht) added inline comments.
intern/cycles/device/metal/kernel.mm
446

I think separate caches per devices indeed required. The number of cache entries for different versions should not be reduced when there are more devices.

For HIP and CUDA we simply add the architecture in the name (for example cycles_kernel_sm_86_1FB56B1136836BDB5920C12047C066D4.cubin). Could we do the same here? The device name may need to be hashed or normalized in some way if it contains characters that don't work in a filename.

We could cache the kernels in separate folders per device, but I don't think it makes a practical difference right now.

This revision now requires changes to proceed.Apr 26 2022, 7:32 PM
  • Separate caches per device
Michael Jones (michael_jones) marked an inline comment as done.Apr 26 2022, 7:59 PM
This revision is now accepted and ready to land.Apr 26 2022, 8:20 PM

I had to revert this commit because it is causing crashes in various regression tests, example logs here:

	124 - cycles_motion_blur_metal (Failed)
	127 - cycles_render_layer_metal (Failed)
	128 - cycles_reports_metal (Failed)
	131 - cycles_sss_metal (Failed)

https://builder.blender.org/admin/#/builders/9/builds/4773

The cause is not immediately obvious to me. I tried reverting just the inlining changes in kernel/device/metal/compat.h but that did not solve the crash.

This revision is now accepted and ready to land.Apr 28 2022, 12:50 AM
Brecht Van Lommel (brecht) requested changes to this revision.Apr 28 2022, 12:50 AM
This revision now requires changes to proceed.Apr 28 2022, 12:50 AM