Problem:
Currently arm64 assembly support is checked before cuda/metal in u32
inversion routine. Under nvcc context, C++ preprocessor will incorrectly
generate inlined arm64 assembly instead of calls to cuda built-in
functions. This causes nvcc to error out on ARM64:
.../intern/cycles/kernel/../util/math.h(930): error: A sm operand modifier not supported at "%w", try removing modifier or escaping with %
In the case of Metal, current ifdef ordering causes Metal code paths to
not generate at all.
Solution:
Do CUDA / Metal ifdef's first before ifdef'ing ARM64 native instructions.