Page MenuHome

Functions: Enable more gcc optimizations for multi-functions.
ClosedPublic

Authored by Jacques Lucke (JacquesLucke) on Jan 7 2023, 7:51 PM.

Details

Summary

The benchmark is done using the files added to our benchmark suite in rBL63163.
They test executing a field consisting of a couple hundred math nodes on 10 million points.

                                         blender3.3           master-clang         master-gcc           patch-gcc            
field_add                                0.3717s              0.0934s              0.2723s              0.0924s              
field_divide                             0.4980s              0.1551s              0.4026s              0.3409s              
field_multiply                           0.3850s              0.0928s              0.2720s              0.0918s              
field_sin_cos                            0.3138s              0.3055s              0.2869s              0.2868s

This patch mainly helps GCC catch up with clang. Clang is still quite a bit better for the division node because it vectorizes safe_divide (GCC either can't do that, or I haven't found the right flags).

Vectorized safe_divide:

movups  xmm0, xmmword ptr [rsi + 4*rax]
movups  xmm1, xmmword ptr [rdx + 4*rax]
xorps   xmm2, xmm2
cmpneqps        xmm2, xmm1
divps   xmm0, xmm1
andps   xmm0, xmm2
movups  xmmword ptr [rcx + 4*rax], xmm0

Diff Detail

Repository
rB Blender
Branch
gcc-optimize-field (branched from master)
Build Status
Buildable 25275
Build 25275: arc lint + arc unit