The benchmark is done using the files added to our benchmark suite in rBL63163.
They test executing a field consisting of a couple hundred math nodes on 10 million points.
blender3.3 master-clang master-gcc patch-gcc field_add 0.3717s 0.0934s 0.2723s 0.0924s field_divide 0.4980s 0.1551s 0.4026s 0.3409s field_multiply 0.3850s 0.0928s 0.2720s 0.0918s field_sin_cos 0.3138s 0.3055s 0.2869s 0.2868s
This patch mainly helps GCC catch up with clang. Clang is still quite a bit better for the division node because it vectorizes safe_divide (GCC either can't do that, or I haven't found the right flags).
Vectorized safe_divide:
movups xmm0, xmmword ptr [rsi + 4*rax] movups xmm1, xmmword ptr [rdx + 4*rax] xorps xmm2, xmm2 cmpneqps xmm2, xmm1 divps xmm0, xmm1 andps xmm0, xmm2 movups xmmword ptr [rcx + 4*rax], xmm0
