BMesh has served Blender well for many years, but it as a few problems: the memory overhead is high, and the performance isn't always great compared to pure Mesh arrays.
= The Problem =
BMesh is a BREP (a variant of the classic half-edge data structure). The problem with BREPs is they do not always work well with CPU caches, especially on 64-bit platforms where all of the helper pointers can overwhelm the cache. When Geoffry Bantle designed BMesh he tried to improve cache //coherency// by writing the mempool library, but this did nothing to solve the problem of huge numbers of 8-byte helper pointers overwhelming the cache altogether (at the time Blender was still 32-bits).
I recently had a bunch of users on Blenderartists run a series of benchmarks for a simple vertex smooth operation. To summarize the results (which will be discussed in more depth later) we can get a pretty decent performance boost (~42%) if we replace all helper pointers in BMesh with 4-byte integer indices.
This makes sense. BMesh is memory bound (profiling consistently shows this), so replacing 8-byte pointers with 4-byte ints will reduce the load on the CPU cache and DRAM by quite a bit.
= Short Intro To BMesh =
BMesh has the following structure:
{F10396519}
1. Vertices, which store one pointer to an edge.
2. Edges, which store six pointers to vertices and one to loops (two for the edges plus the four used for each vertex's disk cycle, see blue lines).
3. Loops, which have 1 pointers each pointing at a vertex, edge, and a face, along with four more pointers to other loops. Like edges loops participate in two linked lists: the face they belong to, and also the radial cycle (see red line) of loops surrounding an edge.
= The Benchmark =
To investigate where we should take BMesh in the future I wrote a series of benchmarks and
[[ https://blenderartists.org/t/sculpt-dyntopo-test-build-updated/1323378/22 | had users on BlenderArtists run them ]]. There are four benchmarks in total, each of which run a simple vertex smoothing operation:
1. BMesh with random element order.
2. BMesh with vertex-cluster-sorted element order. Note that #3 and #4 uses the same order.
3. A version of BMesh with integer indices.
4. A so-called 'data-oriented' structure.
The data-oriented structure is inspired by recent developments in the games industry, and looks like this:
```
typedef struct MeshTest {
//vertex arrays
float (*v_co)[3];
float (*v_no)[3];
int *v_e;
int *v_index;
int *v_flag;
//edge arrays
int *e_v1;
int *e_v2;
int *e_v1_next;
int *e_v2_next;
int *e_l;
int *e_flag;
int *e_index;
//loop arrays
int *l_v;
int *l_e;
int *l_f;
int *l_next;
int *l_prev;
int *l_radial_next;
int *l_radial_prev;
//face arrays
int *f_l;
int *f_index;
int *f_flag;
int totvert, totedge, totloop, totface;
MemArena *arena;
} MeshTest;
```
== Results ==
As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices.
Here are the raw results:
| random | cluster-ordered bm | % improvement| struct of arrays | % improvement| bm with indices | % improvement| RAM|
|-----------|--------|----------|-------|----------|---------|-------|---------|
|1.22| 1.04| 14.42| 0.73| 67%| 0.94| 29%| 0|
|1.49| 1.46| 2.35| 1.10| 36%| 1.17| 27%| 0|
|1.29| 1.13| 14| 0.75| 71.54%| 0.89| 45.08% | 0|
|1.58| 1.40| 12.3| 1.09| 44.7%| 1.11| 42.42%| 16|
|1.53| 1.36| 12.77| 1.08| 41.6%| 1.07| 42.91%| 0|
|1.56| 1.39| 12.47| 1.09| 42.65%| 1.10| 42.15%| 16|
|1.22| 1.06| 15.05| 0.75| 63.85%| 0.82| 49.67%| 32|
|2.32| 1.99| 16.54| 1.87| 23.80%| 1.50| 54.89%| 16|
| 0.86| 0.76| 12.14| 0.28| 202.44| 0.57| 51.13| 32|
| 0.87| 0.76| 14.0| 0.54| 61.47| 0.72| 20.16| 32|
= Long term plans =
Given these results, I think we should move BMesh to use integer offsets instead of pointers. This would be a far less disruptive change then the data-oriented approach. Note that I *don't* think we should rebuild the functionality of BMesh on top of Mesh as custom attribute layers, as that would be too disruptive to the existing codebase.
== Needed Modifications ==
The structural modifications would be fairly simple (the actual code work might be another matter); basically, instead of using mempools for storing elements we would use simple arrays with freed element lists.
A hypothetical element array structure might look like this:
```
typedef struct ElementArray {
void *array;
int element_size;
int length;
int total_size;
BLI_bitmap usedmap;
SomeKindOfList freelist;
} ElementArray;
```
Of course this would make accessing the structure more complicated:
```
BMLoop *l = bm->loops[e->l_i];
do {
} while ((l = bm->loops[l->radial_next_i]) != bm->loops[e->l_i]);
```
. . .but I suppose it's not too bad.
= C++ =
It might be a good idea to use C++ for this. We could write ElementList and the topological
iterators in C++ and wrap them in C APIs. The core element structures will have to remain
in C though.