For clarity sake, the batch cache now uses exclusively per Loop attributes.
While this is a bit of a waste of VRAM (for the few case where per vert
attribs are enough) it reduces the complexity and amount of overall VBO
to update in general situations.
This patch also makes the VertexBuffers filling multithreaded. This make the
update of dense meshes a bit faster. The main bottleneck is the
IndexBuffers update which cannot be multithreaded efficiently (have to
increment a counter and/or do a final sorting pass).
We introduce the concept of "extract" functions/step.
All extract functions are executed in one thread each and if possible,
using multiple thread for looping over all elements.
My result (an heavilly subdivided sphere + lvl4 subsurf):
| Fps | Frame | Iter | Rdata 2.80master | 4.8 | 206ms | 82ms | 11ms Base | 4.5 | 240ms | 98ms | 22ms Opti | 6.4 | 144ms | 22ms | 20ms
The 9ms speed loss (in Rdata) is that we require loop normals to be precomputed
before iteration. We can still recover this (this is a TODO) Done.
To reviewers: The multi-thread part starts in mesh_buffer_cache_create_requested.
GPU
- Add GPUIndexBuf subrange This allows to render only a subset of an index buffer. This is nice as we can render each material surfaces individually and the whole mesh with the same index buffer.
- Add vertex format deinterleaving This makes it possible to have each attrib use a contiguous portion of the vertex buffer, making attribute filling much more easy and fast as this is how they are store in blender Custom Data layers.
- Batch: Reverse order of VBO binding This is to ensure the vbo[0] always has predecence over other VBO. This is important for overriding attributes by switching vbo binding order.
- Make small float normal compression functions inlined Remove some overhead in vbo creation.
Mesh Batch Cache: Refactor
- Restructure the buffers cache : One cache for final mesh and one for the edit mesh cage.
- Add debug timer.
- Use Extract naming convention to name extract functions that fill vbo/ibo.
- Separate extract functions into separate file (for clarity).
- Separate loose elements looping functions to avoid iteration complexity. (unfortunately this makes the code more verbose).
- Some iter functions are threadable and tagged as such.
- Add multithreaded iteration for extract functions that supports them.
