Page MenuHome

Cleanup: Remove mesh vertex "temp tag" flag
ClosedPublic

Authored by Hans Goudey (HooglyBoogly) on Jan 20 2022, 4:29 AM.

Details

Summary

As part of the project of converting MVert into float3
(more details in T93602), this is an easy step, since it
is only locally used runtime data. In the six places it was
used, the flag was replaced by a local bitmap.

By itself this change has no benefits other than making some
code slightly simpler. It only really matters when the other
flags are removed and it can be removed from MVert
along with the bevel weight.

Diff Detail

Event Timeline

Hans Goudey (HooglyBoogly) requested review of this revision.Jan 20 2022, 4:29 AM
Hans Goudey (HooglyBoogly) created this revision.
Hans Goudey (HooglyBoogly) edited the summary of this revision. (Show Details)

@Jacques Lucke (JacquesLucke), I'm adding you as a reviewer since the code changes are straightforward and align with an agreed-upon goal, I just want to make sure someone else has seen it.

Jacques Lucke (JacquesLucke) requested changes to this revision.Jan 20 2022, 12:36 PM
Jacques Lucke (JacquesLucke) added inline comments.
source/blender/blenkernel/intern/mesh_validate.c
607

This looks like it could introduce a lot of new overhead. Note that you create a new bitmap for every face here. Better create it once outside of the outer loop.

This revision now requires changes to proceed.Jan 20 2022, 12:36 PM

Just one concern...

While it's good to review smaller parts of replacing each of MVert's flags, it might be good to wait to land all patches at once after a benchmarking.

I'm still not sure if moving the flag member to make a struct of Arrays is advantageous.

This would require reading from different parts of memory which can be even worse in multithread operations.

A benchmark can remedy this concern.

I'm still not sure if moving the flag member to make a struct of Arrays is advantageous.

It should be very advantageous, for a few reasons:

  • In many situations the flag is unnecessary. When only vertex positions are necessary (like a mesh bounding box), access becomes significantly faster because less memory is used (one byte less per vertex can add up when there are millions of vertices).
  • Makes code clearer by separating concerns about where things are stored (i.e. no need to clear a flag before it's used, each type of data is conceptually separate, etc).
  • It's possible to use SIMD operations directly on vertex positions in many cases, since there are no padding bytes to work around.
  • The massive code simplification of Span<float3> instead of Span<MVert>. This make the code in so many areas more generic and simpler (possible to use generic utility functions).
  • Everywhere the following functions are used will be much simpler and faster: BKE_mesh_vert_coords_alloc, BKE_mesh_vert_coords_get, BKE_mesh_vert_coords_apply_with_mat4, BKE_mesh_vert_coords_apply.
  • Interfacing with exporters, libraries, the GPU, etc. that expect a contiguous array of float vectors is much simpler.
  • Often you only need to access the selection, or a tag, etc. All of these operations will become much faster since they don't need to load all of the vertex locations as well.
    • For example, when only the bitflag added in this patch needs to be loaded (not the entire MVert), we're looking at 16x less memory that needs to be loaded (16 bytes -> 1 bit per vertex).
  • Other areas in Blender (point clouds, hair, attributes, etc.) are moving to the SoA format, so it's consistent with elsewhere. In other words, it's the established best practice.

This would require reading from different parts of memory which can be even worse in multithread operations.

As long as memory reading is predictable, local, and there isn't more memory to read in total, it shouldn't matter whether the data comes from the same array or not.
I don't think there's anything in particular about multi-threading involved here either.


In conclusion, personally I find these arguments overwhelmingly in favor of the SoA data layout. I'm happy to do some performance testing with this patch if you'd like, but I'd really prefer not to have the patches around getting stale for weeks, since this isn't a project I'll have the time to finish quickly.
Except for the case that Jacques pointed out (which I'll fix), I expect the change here to have negligible affects on performance, probably nothing observable.

  • Only allocate bitmap once for mesh validation
This revision is now accepted and ready to land.Jan 23 2022, 3:26 PM

I did some performance tests, choosing the function that looks the simplest to me, BKE_crazyspace_set_quats_mesh.
On a 1.5 million vertex mesh, the function took 0.38696 seconds before, on average. After, it took 0.38044 seconds.
That's an improvement of 1.7%. So a negligible change, but more likely to be an improvement.
My guess is that less memory needs to be loaded to check the flags, since they are all contiguous. Which means that loading unnecessary mesh vertices can be skipped.
That seems like a fairly typical situation to me. This doesn't even contain the improvements from when the flag can eventually be removed from MVert.

@Germano Cavalcante (mano-wii) What do you think, given this information?

This improvement is curious, I wouldn't expect it.
It makes sense that the improvement could be due to some vertices no longer needing to be accessed. (And bitmap memory is more compact).

The difference is small but it is good news.

I wonder if the same improvement can be seen in multi-threaded operations and with flags like ME_HIDE that are read but rarely set.

A frequent access to ME_HIDE is in the extract_pos_nor_iter_poly_mesh function. It is called for drawing whenever a mesh is changed.

Yes, I would expect functions like that to become faster.
Bitflags do have the downside that they're hard to write to in a multi-threaded way, since the smallest unit for atomics is a byte rather than a bit. But they may be ideal for flags that are read but rarely set, like you say.

It's possible that we will want more than one way to store boolean attributes, i.e. Array<bool>, bitflags, storing indices directly, maybe even heap data structures. That will have to be decided soon I suppose.
I think those types should be interchangeable at runtime, so I don't think we'll be locked in to whatever we choose first.

I'll go ahead and land this patch then.

It seems like fine land the patch. ME_VERT_TMP_TAG is used in few places and the code is even cleaner.

It might be worth implementing support for sparse CustomData arrays, it would help mitigate the memory overhead from having 1 byte per bool.

The downside is that reads are more complex:

T read_elem(T array, int i) {
  Page<T> page = array[i >> array.pageShift];

  if (page.array) { //page has per-vertex data
    return page.array[i & array.pageMask];
  } else { //page has same value for all verts
    return page.value; 
  }
}