Page MenuHome

LatticeDeform: Performance
ClosedPublic

Authored by Jeroen Bakker (jbakker) on Oct 2 2020, 9:17 AM.

Details

Summary

This patch improves the single core performance of the lattice deform.

  1. Prefetching deform vert during initialization. This data is constant for each innerloop. This reduces the complexity of the inner loop what makes more CPU resources free for other optimizations.
  2. Prefetching the Lattice instance. It was constant. Although performance wise this isn't noticeable it is always good to free some space in the branch prediction tables.
  3. Remove branching in all loops by not exiting when the effect of the loop isn't there. The checks in the inner loops detected if this loop didn't have any effect on the final result and then continue to the next loop. This made the branch prediction unpredictable and a lot of mis predictions were done. For smaller inner loops it is always better to remove unpredictable if statements by using branchless code patterns.
  4. Use SSE2 instruction when available.

This gives 50% performance increase measured on a
Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz with GCC 9.3.
Also check other compilers.

Before:

performance_no_dvert_10000 (4 ms)
performance_no_dvert_100000 (30 ms)
performance_no_dvert_1000000 (268 ms)
performance_no_dvert_10000000 (2637 ms)

After:

performance_no_dvert_10000 (3 ms)
performance_no_dvert_100000 (21 ms)
performance_no_dvert_1000000 (180 ms)
performance_no_dvert_10000000 (1756 ms)

Diff Detail

Repository
rB Blender

Event Timeline

Jeroen Bakker (jbakker) requested review of this revision.Oct 2 2020, 9:17 AM

Generally looks fine, only minor things noted.

source/blender/blenkernel/intern/lattice_deform.c
97

+ 1 should be + sizeof(float).

124

Could just use int here?, though int32_t was to be used when using 4 bytes is significant (if it's converted to char[4] for example).

227

Is CLAMPIS necessary here? It looks like the value will never be zero, min_ii(ww * w_stride, idx_w_max); seems to work from a quick test. Same for the two uses below.

255

*picky* The cast can be avoided here &co_vec[0], avoids having to make sure casts are OK - in general.

source/blender/blenkernel/intern/lattice_deform_test.cc
39

*picky* - prefer float (*coords)[3], makes indexing the values read nicer.

source/blender/blenkernel/intern/lattice_deform.c
227

Correction "It looks like the value will never be below zero"

source/blender/blenkernel/intern/lattice_deform.c
174

Should be ifdef, same below.

source/blender/blenkernel/intern/lattice_deform.c
255

Correcting own statement, cast can't be removed as it's __m128.

Jeroen Bakker (jbakker) marked 7 inline comments as done.
  • Fixed Code review comments
  • Use float[3] array in test cases
Jeroen Bakker (jbakker) marked an inline comment as done.Oct 20 2020, 8:21 AM
Jeroen Bakker (jbakker) added inline comments.
source/blender/blenkernel/intern/lattice_deform.c
227

wi can be 0, selecting the first element of a lattice dimension. In this case ww is -1.
In the previous implementation this case was handled in

204: if (ww > 0) {
        ...
212: } else {
213:     idx_w = 0;
214: }

min_ii(ww * w_stride, idx_w_max); would still need to be clamped.

This revision is now accepted and ready to land.Oct 26 2020, 5:12 AM
This revision was automatically updated to reflect the committed changes.