Page MenuHome

Metal: Optimize GLSL to MSL translation. Improve cached compilation
ClosedPublic

Authored by Jason Fielder (jason_apple) on Wed, Jan 18, 7:20 PM.

Details

Summary

Reduces the GLSL to MSL translation stage of shader compilation from 120 ms to 5 ms for complex EEVEE materials. This manifests in faster overall compilations, and faster cache hits for secondary compilations, as the MSL variant is needed as a key.

Startup time is also improved for both first-run and second-run. Note that this change does not affect shader compilation times within the Metal API.

Also disables shader output to disk

Authored by Apple: Michael Parkin-White

Ref T96261
Depends on D16990

Diff Detail

Repository
rB Blender

Event Timeline

Jason Fielder (jason_apple) requested review of this revision.Wed, Jan 18, 7:20 PM
Jason Fielder (jason_apple) created this revision.
Clément Foucault (fclem) added inline comments.
source/blender/gpu/metal/mtl_shader_generator.mm
221
387

left over comment

473–474

unrelated to this diff, but couldn't we pay this cost by just defining them as constexpr?

This revision is now accepted and ready to land.Mon, Jan 23, 10:58 AM
source/blender/gpu/metal/mtl_shader_generator.mm
473–474

I do have a patch coming in soon which addresses a few more of these cases. But to provide a bit of context, this is mostly a compiler issue specific to Metal, rather than a fundamental problem with how the code "should" work. Metal Shading Language doesn't have a concept of Global-scope within shaders, which means that certain global variables as in GL cannot be declared. Constant globals are allowed, however, in order to emulate GLSL globals, the Metal generated shader wraps a class around the entire GLSL implementation, such that it creates an effective global scope through class members.

There is a compiler issue here where while expressions remain constant, for array/matrix types, due to the ability to perform random access on these, which cannot be resolved at compile time , the compiler allocates them within thread local memory using a chunk of the temporary register file. The performance impact comes when the local memory requirements for the shader exceed the reasonable limit available for each shader core, causing a decrease in occupancy, as memory has to be split across several cores per execution instance. (In the worst case, this can result in a 1/16th of the total performance).

There is a secondary caveat that the compiler also cannot handle constexpr within a class, and instead must use static constexpr, but due to another limitation, using static constexpr can often be invalid in Metal, due to the requirement to declare a scope for variables. This should not be the case, but moving constant declarations within function scope is the least impactful way to work-around this.

One other option would be to hoist constant expressions outside of the class wrapper, but this would require more complex parsing and increase shader translation time.

Certainly not ideal, and hopefully something which could be done more cleanly in the future.


So TL;DR unfortunately, constexpr does not work as expected due to memory scope (thread, threadgroup, device, constant) being incompatible with static constexpr within the shader's class body, due to how the generated Metal shaders wrap the GLSL implementation.

NOTE: Likely indirect dependency on Viewport compositor/Compute shader support in Metal (https://developer.blender.org/D16990). I will re-base this after merge of that patch.

Rebased against latest master.