Page MenuHome

Eevee: Optimize shadows drawing
ClosedPublic

Authored by Germano Cavalcante (mano-wii) on Mar 5 2018, 2:59 PM.

Details

Summary

On some older graphics cards, accessing a const array with variable indices can be very slow. Especially if the array is huge as the concentric in concentric_samples_lib.glsl.

And for some reason, accessing an extensive const array within a loop (while or for) even with constant indexes also has a negative impact on peformanse.

Because of these issues, the shadow_store_frag.glsl shader was very lagged in the gpus AMD Radeon HD 7570M and Intel(R) HD Graphics 4000. So moving any lamp, or navigating with a sun-type lamp was extremely slow.

The solution founded was to unroll the loop that accessed the concentric array and make that access using constant indexes.

The improvement was in the order of seconds to milliseconds. (demonstration)

Thanks to @Clément Foucault (fclem) for pointing out where the problem is and suggesting possible solutions.

Diff Detail

Repository
rB Blender

Event Timeline

Germano Cavalcante (mano-wii) retitled this revision from Eevee: Optimize the drawing of shadows from sun-type lamps to Eevee: Optimize shadows drawing.Mar 5 2018, 3:25 PM

If this is necessary to get good performance, then it seems reasonable. I imagine it was doing slow uncached global memory access on the GPU.

Probably the code could be simplified by using a for loop inside each if, since the compiler can unroll such fixed range loops in principle. It could also be a tiny bit fast to nest those if blocks.

If this is necessary to get good performance, then it seems reasonable. I imagine it was doing slow uncached global memory access on the GPU.

Probably the code could be simplified by using a for loop inside each if, since the compiler can unroll such fixed range loops in principle. It could also be a tiny bit fast to nest those if blocks.

I tried nesting the ifs but the shader of the shadows of type point lamps did not work (no error message or anything. It was as if the lamp was not there).

I tried putting the code in a different function and adding returns but the same problem occurred.

It looks like AMD has a limit of returns to compile a shader.

Putting while (true) and a break to each if worked. But I had a lag with high resolutions.

I'll try your suggestion of the loops on each if

(...)
Probably the code could be simplified by using a for loop inside each if, since the compiler can unroll such fixed range loops in principle. (...)

With the first loops I saw no impact on peformanse. But as soon as I added all the loops, and lag from before came back :\

1diff --git a/source/blender/depsgraph/intern/eval/deg_eval_copy_on_write.cc b/source/blender/depsgraph/intern/eval/deg_eval_copy_on_write.cc
2index c8b9702621e..17ca1733d42 100644
3--- a/source/blender/depsgraph/intern/eval/deg_eval_copy_on_write.cc
4+++ b/source/blender/depsgraph/intern/eval/deg_eval_copy_on_write.cc
5@@ -49,6 +49,7 @@
6 #include "BLI_threads.h"
7 #include "BLI_string.h"
8
9+#include "BKE_curve.h"
10 #include "BKE_global.h"
11 #include "BKE_idprop.h"
12 #include "BKE_layer.h"
13@@ -680,6 +681,7 @@ ID *deg_update_copy_on_write_datablock(const Depsgraph *depsgraph,
14 ListBase gpumaterial_backup;
15 ListBase *gpumaterial_ptr = NULL;
16 Mesh *mesh_evaluated = NULL;
17+ CurveCache *curve_cache = NULL;
18 short base_flag = 0;
19 if (check_datablock_expanded(id_cow)) {
20 switch (id_type) {
21@@ -729,6 +731,10 @@ ID *deg_update_copy_on_write_datablock(const Depsgraph *depsgraph,
22 object->data = mesh_evaluated->id.orig_id;
23 }
24 }
25+ /* Store curve cache and make sure we don't free it. */
26+ curve_cache = object->curve_cache;
27+ object->curve_cache = NULL;
28+
29 /* Make a backup of base flags. */
30 base_flag = object->base_flag;
31 break;
32@@ -764,6 +770,9 @@ ID *deg_update_copy_on_write_datablock(const Depsgraph *depsgraph,
33 ((Mesh *)mesh_evaluated->id.orig_id)->edit_btmesh;
34 }
35 }
36+ if (curve_cache != NULL) {
37+ object->curve_cache = curve_cache;
38+ }
39 object->base_flag = base_flag;
40 }
41 return id_cow;

Ok, I guess we just need the big chunk of code then, not really a problem.

This revision is now accepted and ready to land.Mar 5 2018, 9:54 PM
This revision was automatically updated to reflect the committed changes.