Page MenuHome

DrawManager: Use threading for ibo.fdots_nor/hq
Needs ReviewPublic

Authored by Germano Cavalcante (mano-wii) on May 18 2021, 9:59 PM.

Details

Summary

Here is a general result for research proposed in T88353: DrawManager: Threading for ibo.fdots_nor/hq.

The idea is to analyze whether the creation of the normal facedot cache
is still more efficient in single-thread than multi-thread (as indicated
in the code comment).

To test it 2 files were created. On them, the creation of the cache of
normal face dots is distributed in 8 threads.

File 1:


File 2: File 1 with modifiers applied.

Results:
Using debug menu=21 (Cache time)

High Quality Normals:

No facedotsOriginal valuesPatch Single-threadPatch Multi-thread
File 11.17ms1.43ms1.46ms1.42ms
File 23.65ms3.78ms3.80ms3.79ms

Default Quality Normals:

No facedotsOriginal valuesPatch Single-threadPatch Multi-thread
File 11.16ms1.47ms1.45ms1.40ms
File 23.70ms3.77ms3.73ms3.78ms

Conclusion:
It doesn't get worse anymore, but it doesn't bring a significant improvement either.


System Information
CPU: AMD Ryzen 7 1800X Eight-Core Processor
Operating system: Windows-10-10.0.19041-SP0 64 Bits
Graphics card: Radeon (TM) RX 480 Graphics ATI Technologies Inc. 4.5.14760 Core Profile Context 20.45.37.01 27.20.14537.1001

Diff Detail

Repository
rB Blender
Branch
master
Build Status
Buildable 14628
Build 14628: arc lint + arc unit

Event Timeline

Germano Cavalcante (mano-wii) requested review of this revision.May 18 2021, 9:59 PM
Germano Cavalcante (mano-wii) created this revision.

Code seems fine. Personally I would have a separate method for the hq and lq normals. But I expect that the is_hq branch would be optimized away and the code will we are not able to remove a complicated test.

I didn't expect a real difference for this patch. Enabling the threading would also have other benefits as well what is harder to measure.
for example in the current implementation all non threaded extractions are run in serial. having less extractions here can influence performance of other batch creations as well.
In order to reproduce your timings can you add the way how you measure it to the description. Also mention your CPU in the description.

I will do a round of testing myself and add my numbers to it.

source/blender/draw/intern/draw_cache_extract_mesh.c
4892

the test is the same as a few lines above. better make it a const bool before the if statement.

When reading code and you see structures like this you need to inspect it more clearly if code is the same or that there might be a tiny difference. best to remove this un-clarity so future readers don't have to waist time on it dissecting.

System info:

platformLinux-5.11.0-7614-generic-x86_64-with-glibc2.33
rendererAMD SIENNA_CICHLID (DRM 3.40.0, 5.11.0-7614-generic, LLVM 11.0.1)
vendorAMD
version4.6 (Core Profile) Mesa 21.0.1
cpuIntel(R) Core(TM) i7-6700 CPU @ 3.40GHz
compilergcc version 10.3.0

Timing have been measured using DEBUG_TIME in draw_cache_extract_mesh.

Test 2

master: rdata 2ms iter 9ms (frame 33ms)
this patch: rdata 2ms iter 9ms (frame 33ms)

NOTE: all normals in the example file are identical. Should not be a huge issue in this case.

Subdivided cube

I would use a cube subdivided 7 times, modifiers applied. that gives around 400000 faces.

The test is selecting some vertices and move them. During this test the next buffers are updated on each frame:

  • vbo.pos_nor
  • vbo.lnor
  • vbo.edit_data
  • vbo.fdots_pos
  • vbo.fdots_nor
  • ibo.tris
  • ibo.fdots

master: rdata 8ms iter 45ms (frame 153ms)
this patch rdata 7ms iter 38ms (frame 143ms)

With DEBUG_TIME uncommented and moving the selected vertices until getting a consistent timing, I got this result:

master:rdata 9ms iter 50ms (frame 163ms)
this patch:rdata 9ms iter 51ms (frame 187ms)

I will test with a simpler geometry. So is each buffer computed on a different thread?
EDIT: Very inaccurate results with low poly, we need a more robust profiler to work in these cases.

Germano Cavalcante (mano-wii) marked an inline comment as done.
  • Deduplicate code