Page MenuHome

GPU Immediate Refactor
AbandonedPublic

Authored by Germano Cavalcante (mano-wii) on May 3 2020, 1:11 AM.

Details

Summary

The original code for immediate mode has always sought efficiency in
memory (for using glMapBufferRange) and performance for creating its
own memory management system, avoiding as much as possible calling
glBufferData (which respecifies the buffer).

But this ideal started to change when glBufferData started to be
called more often.

The commit rB5d9d24685108 caused glBufferData to be called almost
every time immBegin is called.

This makes the memory management entirely dependent on the GPU driver.

This patch follows some of the ideas covered in
https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf
with respect to method Unsynchronized Buffers.

The idea is to create 3 buffers and alternate them in each frame.
So the buffer_offset can be reset every 3 frames.

The maximum amount of memory read in the frame is used for each buffer.
So we kind of allocate the minimum required memory used by Blender for
drawing with immediate mode.

Diff Detail

Repository
rB Blender
Branch
arcpatch-D7607 (branched from master)
Build Status
Buildable 7864
Build 7864: arc lint + arc unit

Event Timeline

Germano Cavalcante (mano-wii) requested review of this revision.May 3 2020, 1:12 AM

GHOST_SwapWindowBuffers should already force the buffers to synchronize, so I wonder if using a buffer would be enough.

  • Cleanup: rename variables, remove members and improve comments
  • Cleanup and decrease CHUNK
  • Delete Fence
  • Reposition fence after each immediate drawing
  • Fix immDestroy
  • Fix freeze on AMD Radeon HD 7570M
  • Cleanup: Improve comments
  • Implement Timer
  • Cleanup

Performance dropped from 32 fps to 29 fps on

.

This is too risky a change for 2.83. Since you confirmed in T76113#922371 that the bug was due to rB5d9d24685108, I think we should just make that change Intel only for now.

This patch may still be useful for 2.90, but I haven't reviewed it. For this additional code complexity and potential for bugs to be worth it, this should show a performance improvement besides just fixing a regression which we can do simpler. Have you measure performance differences?

The commit rB5d9d24685108 caused glBufferData to be called almost every time immBegin is called.

That sounds wrong to me. My suggestion for that fix was only to use a different buffer for the strict and non-strict case. But it seems they have a different default_size which I guess causes it to recreate the buffer often.

@Jeroen Bakker (jbakker), any reason why this was done? Why wasn't the default buffer size for both set to DEFAULT_INTERNAL_BUFFER_SIZE?

@Brecht Van Lommel (brecht) when the default size of the strict buffer was set to any other value the performance dropped to an unusable state. I can check the code-flow of that part. If everything fails we could do a platform specific value as a last resort.

I am not able to reproduce the slowdown anymore. I did got a new firmware update this morning when I booted the machine.

@Germano Cavalcante (mano-wii) can you check if

1diff --git a/source/blender/gpu/intern/gpu_immediate.c b/source/blender/gpu/intern/gpu_immediate.c
2index 2d093dacdce..7bcb0a7a552 100644
3--- a/source/blender/gpu/intern/gpu_immediate.c
4+++ b/source/blender/gpu/intern/gpu_immediate.c
5@@ -48,7 +48,6 @@ typedef struct ImmediateDrawBuffer {
6 GLubyte *buffer_data;
7 uint buffer_offset;
8 uint buffer_size;
9- uint default_size;
10 } ImmediateDrawBuffer;
11
12 typedef struct {
13@@ -96,12 +95,10 @@ void immInit(void)
14
15 imm.draw_buffer.vbo_id = GPU_buf_alloc();
16 imm.draw_buffer.buffer_size = DEFAULT_INTERNAL_BUFFER_SIZE;
17- imm.draw_buffer.default_size = DEFAULT_INTERNAL_BUFFER_SIZE;
18 glBindBuffer(GL_ARRAY_BUFFER, imm.draw_buffer.vbo_id);
19 glBufferData(GL_ARRAY_BUFFER, imm.draw_buffer.buffer_size, NULL, GL_DYNAMIC_DRAW);
20 imm.draw_buffer_strict.vbo_id = GPU_buf_alloc();
21- imm.draw_buffer_strict.buffer_size = 0;
22- imm.draw_buffer_strict.default_size = 0;
23+ imm.draw_buffer_strict.buffer_size = DEFAULT_INTERNAL_BUFFER_SIZE;
24 glBindBuffer(GL_ARRAY_BUFFER, imm.draw_buffer_strict.vbo_id);
25 glBufferData(GL_ARRAY_BUFFER, imm.draw_buffer_strict.buffer_size, NULL, GL_DYNAMIC_DRAW);
26
27@@ -251,10 +248,10 @@ void immBegin(GPUPrimType prim_type, uint vertex_len)
28 active_buffer->buffer_size = bytes_needed;
29 recreate_buffer = true;
30 }
31- else if (bytes_needed < active_buffer->default_size &&
32- active_buffer->buffer_size > active_buffer->default_size) {
33+ else if (bytes_needed < DEFAULT_INTERNAL_BUFFER_SIZE &&
34+ active_buffer->buffer_size > DEFAULT_INTERNAL_BUFFER_SIZE) {
35 /* shrink the internal buffer */
36- active_buffer->buffer_size = active_buffer->default_size;
37+ active_buffer->buffer_size = DEFAULT_INTERNAL_BUFFER_SIZE;
38 recreate_buffer = true;
39 }
40
applied to blender-v2.83-release branch fixes the issue?

@Jeroen Bakker (jbakker), this patch solves the problem of freezing.
Remembering that I can't reproduce the crash (but it is probably related to the freeze).

Germano Cavalcante (mano-wii) planned changes to this revision.EditedMay 5 2020, 4:23 PM

Since rB37182c369aa6 has been commited, this patch is not really necessary.
we still need a profile to see if it really brings benefits in performance or memory.

The immediate mode has been rewritten to isolate the Opengl calls. This patch is too old to make sense now.

However the Immediate mode can still be improve on the backend (which is easier and cleaner with the new C++ API) but that would be better to do it from scratch and in another patch.