Page MenuHome

PBVH Draw Cleanup
ClosedPublic

Authored by Joseph Eagar (joeedh) on Jul 12 2022, 1:45 AM.

Details

Summary

New Design

This patch is a rewrite of PBVH draw mode that puts attributes into individual VBOs, eliminating (hopefully) a variety of bugs (T99282, T98889). The old system tried to create a single batch that could feed every open viewport. The only safe way to do this safely is to upload every last color and UV attribute to the viewport whether they are needed for rendering or not; calculating and merging the attribute layout of every viewport prior to rendering would require significant refactoring of the draw manager.

This new system creates one VBO per attribute. Each attribute layout is given its own GPU batch which is cached inside the owning PBVH node.

Notes:

  • This is a full C++ rewrite. The old code is still there; ripping it out can happen later.
  • PBVH nodes now have a collection of batches, PBVHBatches, that keeps track of all the batches inside the node.
  • Batches are built exclusively from a list of attributes.
  • This includes coordinates, normals, face sets and masks; GPU_Buffers.h defines fake custom data types (e.g. CD_PBVH_CO_TYPE) that are mapped to above CD_NUMTYPES. Using special names is also an option.
  • Each attribute has its own VBO.
  • Nodes now have more batches than before. Overlays, workbench and EEVEE all have different attribute layouts, each of which will get its own batch.

Old Design

OUTDATED
This patch cleans up PBVH draw to build the PBVH draw cache in two steps, eliminating a variety of bugs. The various draw engines submit requests for GPU attributes to PBVH, which then creates the GPU batches.

There are a number of evil things done to make this work given limitations in the draw manager design:

  1. The batch creation step still happens within a single viewport draw call. While the mesh cache is shared between viewports it's not built prior to viewport draw, which means in some cases the GPU mesh may be built multiple times. For example, say there are two viewports, A and B. A builds the mesh cache it needs, then draws. B checks the mesh cache and sees it needs more attributes, so it rebuilds the cache and then draws.
  2. Because of #1 we cannot avoid invalidating batches within the batch building process, corrupting memory. This is avoided by clearing and reinitializing the same GPUBatch structures with new VBOs. This works as the draw cache only stores pointers to batches and not the VBOs themselves.
  3. The code paths where engines request GPU attributes are still rather messy and confusing.

Other notes:

  1. PBVH_draw_cb has been rewritten into two functions, BKE_pbvh_get_batches and BKE_pbvh_update_batches.
  2. Code still needs to be threaded.
  3. Fast navigate is not yet implemented.
  4. Delay view updates is implemented but not tested.
  5. There is a bug where objects briefly disappear and then reappear when the attribute layout changes.

This patch was originally D15368.

Diff Detail

Repository
rB Blender
Branch
temp-pbvh-draw-manager (branched from master)
Build Status
Buildable 22947
Build 22947: arc lint + arc unit

Event Timeline

Joseph Eagar (joeedh) requested review of this revision.Jul 12 2022, 1:45 AM
Joseph Eagar (joeedh) created this revision.
Joseph Eagar (joeedh) edited the summary of this revision. (Show Details)Jul 12 2022, 1:49 AM

As I understand it, the draw manager regular mesh drawing handles multiple viewports by allocating additional batches for attributes as needed. Is there a reason this needs to rebuild batches?

I think it would be good to use regular mesh logic and data structures as much as possible, with the aim of eventually moving this code into the draw module and integrating as just another draw_cache_impl_ file.

As I understand it, the draw manager regular mesh drawing handles multiple viewports by allocating additional batches for attributes as needed. Is there a reason this needs to rebuild batches?

I think it would be good to use regular mesh logic and data structures as much as possible, with the aim of eventually moving this code into the draw module and integrating as just another draw_cache_impl_ file.

That would take too much VRAM bandwidth. Remember that while most sculpt tools incrementally update the GPU mesh, some do not (e.g. elastic deform, the filters).

That would take too much VRAM bandwidth. Remember that while most sculpt tools incrementally update the GPU mesh, some do not (e.g. elastic deform, the filters).

Is the idea that interleaved storage in one buffer is faster than multiple smaller buffers? It's not obvious to me that this saves much VRAM bandwith.

But regardless, since the data already seems to be interleaved it wouldn't make sense to do the work of changing that and verifying if it helps as part of this refactor, since it's a much bigger change. So no need to revisit that here.

The overall idea of splitting the updating and drawing into two separate parts make sense.

It's difficult to review this patch since it seems to be work in progress still. Changes to remove the max vcol limit from operators/modifiers also should not be part of this.

So I'm not sure if you are asking for code review here, design feedback, .. ? Or if this is just a WIP that needs no feedback right now.

That would take too much VRAM bandwidth. Remember that while most sculpt tools incrementally update the GPU mesh, some do not (e.g. elastic deform, the filters).

Is the idea that interleaved storage in one buffer is faster than multiple smaller buffers? It's not obvious to me that this saves much VRAM bandwith.

But regardless, since the data already seems to be interleaved it wouldn't make sense to do the work of changing that and verifying if it helps as part of this refactor, since it's a much bigger change. So no need to revisit that here.

Sorry, I wasn't clear. I just meant that PBVH shares GPU batches across all open viewports. Having separate VBOs with different attribute layouts for each open viewport is not feasible; it would use an enormous amount of VRAM bandwidth to send all that duplicate data to the GPU. The better solution is to merge the attribute layouts from all viewports into a single shared set of VBOs.

I did not review it yet, but giving my two cents on what has been discussed.

Is the idea that interleaved storage in one buffer is faster than multiple smaller buffers? It's not obvious to me that this saves much VRAM bandwith.

Do not use interlieved storage in new code. It can be useful for packing small attributes together but has no real benefit for attributes that are at least 4bytes long. Most not so old GPUs will have enough caches for each individual input streams. Meaning that separate VBOs don't have any performance penalty on the GPU. Also this makes the CPU code much simpler and filling the VBOs should be just a memcpy. I am not aware if this is the current design of the PBVH drawing.

Each attribute should be its own VBO as we now have GPU_BATCH_VBO_MAX_LEN = 16. With this, you can have one GPUBatch per viewport / Attribute set. And thoses should not be a big overhead to keep cached.

I would like to make a precision: When I wrote in T97665 that we need 1 VBO for the whole PBVH I was talking about positions and normals. Attributes can still be in their own VBOs. I was also talking about not fragmenting the mesh into small VBOs.

I did not review it yet, but giving my two cents on what has been discussed.

Is the idea that interleaved storage in one buffer is faster than multiple smaller buffers? It's not obvious to me that this saves much VRAM bandwith.

Do not use interlieved storage in new code. It can be useful for packing small attributes together but has no real benefit for attributes that are at least 4bytes long. Most not so old GPUs will have enough caches for each individual input streams. Meaning that separate VBOs don't have any performance penalty on the GPU. Also this makes the CPU code much simpler and filling the VBOs should be just a memcpy. I am not aware if this is the current design of the PBVH drawing.

Each attribute should be its own VBO as we now have GPU_BATCH_VBO_MAX_LEN = 16. With this, you can have one GPUBatch per viewport / Attribute set. And thoses should not be a big overhead to keep cached.

I would like to make a precision: When I wrote in T97665 that we need 1 VBO for the whole PBVH I was talking about positions and normals. Attributes can still be in their own VBOs. I was also talking about not fragmenting the mesh into small VBOs.

I hadn't realized you could attach more than one VBO to a batch. I'm going to play around with this.

Okay, we're going to have to completely rewrite this. I don't think we can reuse any code from the draw cache system.

Completely new version. The code now takes Clement's approach of putting attributes in their own VBOs.

Notes:

  • This is a full C++ rewrite. The old code is still there; ripping it out can happen later.
  • PBVH nodes now have a collection of batches, PBVHBatches, that keeps track of all the batches inside the node.
  • Batches are built exclusively from a list of attributes.
  • This includes coordinates, normals, face sets and masks; GPU_Buffers.h defines fake custom data types (e.g. CD_PBVH_CO_TYPE), are mapped to above CD_NUMTYPES. Using special names is also an option.
  • Each attribute has its own VBO.
  • Nodes now have more batches than before. Overlays, workbench and EEVEE all have different attribute layouts, each of which will get its own batch.
Joseph Eagar (joeedh) edited the summary of this revision. (Show Details)Aug 7 2022, 1:54 PM
Joseph Eagar (joeedh) edited the summary of this revision. (Show Details)Aug 7 2022, 2:12 PM
Joseph Eagar (joeedh) edited the summary of this revision. (Show Details)
source/blender/gpu/intern/gpu_buffers.c
445

Do not use GPU_COMP_U8, 3. use 4 components. Many GPU will require padding here and this happens under the hood here.

I do wonder if it wouldn't be better to just pass the set index to the shader and let the shader derive the color instead? Could even compute the color once per set on CPU and pass that using a 1D texture. This should speed the updates up.

source/blender/gpu/intern/gpu_buffers.c
445

I noticed the padding. It kept me from doing straight copies. Anyway, yes we can do this on the GPU, that's a much better idea.

Julien Kaspar (JulienKaspar) edited the summary of this revision. (Show Details)
  • Support PBVH_GRIDS and PBVH_BMESH
  • Support face hiding.
  • Finished dyntopo implementation
  • Got wireframe working

@Clément Foucault (fclem) @Kévin Dietrich (kevindietrich) This patch is ready for another round of review. Since various high priority bugs depend on this patch it would be ideal to land this soon.

Clément Foucault (fclem) requested changes to this revision.Sep 26 2022, 12:31 AM

The code does look much cleaner. I only have style remarks. It's green-light from me once those are addressed.

source/blender/blenkernel/intern/pbvh.c
542

Rename to pbvh_draw_args_init or pbvh_draw_args_setup.

source/blender/draw/DRW_engine.h
287 ↗(On Diff #55763)

Every changes in this file should be moved to draw_pbvh.h. The BKE PBVH should include the said header even if it is a private header. I guess this is an acceptable exception.

source/blender/draw/intern/draw_cache_impl_mesh.cc
1180

Is the extern "C" needed for a reason?

source/blender/draw/intern/draw_manager_data.c
1113

Left over commented code

1300–1304

Use count_bits_i instead, unless you want to keep it for symmetry reason with the loops below.

1317

Left over printf.

source/blender/draw/intern/draw_pbvh.cc
7 ↗(On Diff #55763)

Might want to change this.

63–68 ↗(On Diff #55763)

We might want to move these to BLI_math_vec_types.hh. I don't know if this would increase overall build time.

321–324 ↗(On Diff #55763)

A disabled left over block should always have a comment explaining why it is disabled.

If for no reason, please remove the code.

358 ↗(On Diff #55763)

The callback function being called foreach causes issue with clang_format. This is because it is considered a loop keyword for cycles. Rename to foreach_grid to fix the issue add more semantic. The problem is also present in other functions of this file.

526 ↗(On Diff #55763)

Use switch statement.

947 ↗(On Diff #55763)

Why is this commented? Is it TODO?

This revision now requires changes to proceed.Sep 26 2022, 12:31 AM
Joseph Eagar (joeedh) marked 10 inline comments as done.
  • Make requested changes
  • Also finished color attribute code, which was missing support for corners and byte colors.
Joseph Eagar (joeedh) marked an inline comment as done.Sep 27 2022, 9:30 PM
Joseph Eagar (joeedh) added inline comments.
source/blender/draw/DRW_engine.h
287 ↗(On Diff #55763)

How about adding a new DRW_pbvh.h header?

source/blender/draw/intern/draw_pbvh.cc
321–324 ↗(On Diff #55763)

That was a debug visualization thing, forgot to remove it.

947 ↗(On Diff #55763)

I decided not to do a separate set of fast buffers.

Clément Foucault (fclem) added inline comments.
source/blender/draw/DRW_engine.h
287 ↗(On Diff #55763)

Could be confusing. But it is better than nothing I guess.

This revision is now accepted and ready to land.Sep 28 2022, 8:23 AM
Joseph Eagar (joeedh) marked an inline comment as done.

Final patch

This revision was automatically updated to reflect the committed changes.