Page MenuHome

Cycles: optimize device updates
ClosedPublic

Authored by Kévin Dietrich (kevindietrich) on Nov 13 2020, 5:40 PM.
Tags
None
Tokens
"Burninate" token, awarded by gilberto_rodrigues."Love" token, awarded by rbx775."Love" token, awarded by Alaska."Like" token, awarded by Fracture128."Like" token, awarded by YAFU."Love" token, awarded by EAW."Love" token, awarded by LazyDodo.

Details

Summary

The main idea is to make use of socket modified flags to detect what
exactly has changed in the scene to avoid updating some data
unnecessarily.

One thing I tried to achieve is to make the tag_update methods on Nodes
(e.g. Background::tag_update) do some detection and tag appropriate
managers or nodes for updates only if necessary, so update tagging in
the system will depend entirely on the Nodes' modified sockets. For API
clients, this might reduce the need to manually tag managers for
updates, as one would simply update a Node and call its tag_update
method (which could become a virtual function on the Node base class to
make it clearer, and managers could be kept away from the public API).

To complement this, the managers tag_update methods now also take a flag
parameter to tell why the tagging is done, so we can eventually make
better decisions. Those flags replace the single need_update boolean
that managers have, and are private. We can access the state of those
flags via a public bool need_update() method. The set of flags is
global, i.e. there is only one enumeration for all the managers. I tried
having specific enumerations for each manager, but that led to
duplicated flags, was a bit tricky to maintain and weird to work with.

Similar flags are used in the Object and Geometry managers to decide if
the device arrays are to be simply updated (data copied in place) or
reallocated. This can be cleaned up a bit.

I will update my benchmarks and post some results a bit later as there
were some improvements since the time they were first shared.

Here is a summary of all the changes, if I did not forget anything:

OptiX:

  • multithreaded BVH packing, and added some logic to not repack the data

from unchanged geometries

  • skip rebuilding the BVH for unmodified objects
  • added refit support for the scene BVH, this requires storing the BVH

in the GeometryManager, instead of recreating it every time, so we can
keep the OptiX handles around

GeometryManager:

  • skip repacking data in the various device vectors for unmodified

geometries or attributes

  • do not rebuild the scene BVH if nothing changed
  • moved some shader change detection in device_update_preprocess, was in

device_update

Attribute:

  • added some logic to detect if an attribute has changed when updating

the attributes on a geometry, so we know which ones to repack, this
might be better implemented with attributes as sockets (I have a partial
implementation but needs a good design)

Shader:

  • split need_update_geometry into 3 different flags for more

granularity: need_update_uvs (for UV pass), need_update_attributes, and
need_update_displacement; the first two ones will tag the entire
geometry as modified as we do not have yet attributes as sockets

Integrator:

  • use modified flags to skip somewhat expensive sample_pattern_lut

rebuilds, and only tag the integrator dependent shaders if the
filter_glossy socket was modified (I think this is only socket affecting
those?)

  • some unnecessary calls to Integrator::tag_update were removed,

perhaps one more can be removed in the passes updates as
Film::tag_update will tag for an Integrator update if the AO passes
change?

Background:

  • only tag the Integrator as modified in Background::tag_update if the

AO related sockets have been modified

device_memory:

  • added a modified flag to detect if we need to copy the memory to the

device

  • added some logic to compute and report the amount of memory copied to

the device, this is a bit rough though and uses std::cerr, and might
be better to have as part of the UpdateStatistics, or some specialized
stats.

ObjectManager:

  • avoid recomputing transforms for objects that were not modified

The Light, Shader, and Object managers were a bit tricky to optimize
because of all the implicit dependencies, so nothing much happened there
(IES light updates can be granularized though). Any tips welcome, maybe
there is some low hanging fruits or simple refactors to do.

Diff Detail

Repository
rB Blender
Branch
cycles_optimization (branched from master)
Build Status
Buildable 11753
Build 11753: arc lint + arc unit

Event Timeline

Kévin Dietrich (kevindietrich) requested review of this revision.Nov 13 2020, 5:40 PM
Kévin Dietrich (kevindietrich) created this revision.
Ray Molenkamp (LazyDodo) added inline comments.
intern/cycles/blender/blender_sync.cpp
373–374

nitpick: dependent

Brecht Van Lommel (brecht) requested changes to this revision.Dec 2 2020, 8:41 PM

@Kévin Dietrich (kevindietrich), is this still more or less the latest state of the code?

I was hoping we could get rid of more or less al tag_update calls in the Blender sync code. But this patch is already almost too big to review, so let's leave that for another step.

intern/cycles/device/device_memory.h
329–330

I don't think these need to be public and defined in another place than other (private) members. These should be abstracted away as much as possible.

It also would be good to add a comment explaining what modified means here (I guess it's that the device memory needs to be updated from the CPU memory).

intern/cycles/device/device_optix.cpp
1206 ↗(On Diff #31050)

Why refit if geometry is not modified, that seems wrong?

intern/cycles/render/attribute.h
204

I think it would be more clear to rename both set_data_from and update to copy_from.

At least it's my understanding that it actually makes a copy. If it steals the pointers, it would be good to clarify that by using && and std::move.

intern/cycles/render/geometry.cpp
924–925

Can that if (dscene->attributes_float.size() && dscene->attributes_float.modified) test be moved into copy_to_device(); somehow?

Or if there are other places that don't set the modified flag but always want to copy, there could be a copy_to_device_if_modified() utility added.

1433

Is this something to be resolved before committing?

1491

Could we somehow not store device_update_flags permanently, but rather compute it in this function and then immediately set .modified flags on the device vectors?

I'm finding it hard to keep track of the various update flags and their meaning. This might simplify things.

1881–1882

Does this catch the case where an object was removed (and not all objects)?

And if so, why is the scene->geometry.size() == 0 check needed?

1960–1962

An AttributeSet.clear_modified() utility would be nice here. I imagine a modified on the AttributeSet itself rather than the individual attributes will become needed at some point anyway.

2009

If we can set the .modified flag on device memory earlier, we could have a bvh_nodes.free_if_modified() here.

Not sure how practical it is, have not checked in detail.

intern/cycles/render/hair.h
143–144

Can't we use the same flags that determine if we can refit, rather than comparing the number of keys and curves? Seems like it would be the same thing.

intern/cycles/render/object.cpp
917

It's not clear to me that when ending the render, this flag is set and device buffers are actually freed.

Is that done somewhere?

intern/cycles/render/scene.cpp
90

I guess this is temporary debugging code, and will be removed or commented out for the commit.

intern/cycles/render/scene.h
65 ↗(On Diff #31050)

Can this enum be defined per manager?

This global list of flags makes it hard to see which ones are valid for which managers.

This revision now requires changes to proceed.Dec 2 2020, 8:41 PM
Kévin Dietrich (kevindietrich) marked 2 inline comments as done.Dec 7 2020, 1:03 AM

@Kévin Dietrich (kevindietrich), is this still more or less the latest state of the code?

Yes it is, although I fixed a couple of bugs.

intern/cycles/render/geometry.cpp
1433

Indeed.

1881–1882

It does catch the case where a single object was removed.

The check for scene->geometry.size() == 0 is needed because if there isn't any more geometries the device_update_flags are not updated in device_update_preprocess so the other condition (device_update_flags & DEVICE_MESH_DATA_NEEDS_REALLOC) fails and the BVH is not updated.

I think we can simply delete the BVH in device_update_preprocess and check here if it is null.

1960–1962

Just noting, a modified on the individual attributes would still be necessary in case where we update a single attribute, we just want to recopy that one in the device_vector and not the others.

2009

The thing is we have two cases:

  • either we can keep the memory and just fill in data from modified geometries in place
  • or we have to reallocate the entire buffer because the number of geometries changed, or some tesselation would take more space, etc.

Having .modified take care of both cases would lead to unnecessarily freeing and recreating the entire buffer for the first case. So I think there could be .need_realloc and a free_if_need_realloc().

intern/cycles/render/hair.h
143–144

Those are mostly used to determine if we have to free the device_memory because the overall data size has changed. Using the same flags that determine if we can refit could do the trick, but we do not have the information to determine if the arrays are still the same size. This is a limitation of the socket flags: we know that an array has been modified, but not how.

intern/cycles/render/object.cpp
917

Setting the device_flags is done in ObjectManager::device_update, so no it is not done anywhere.

<Removed duplicate comment>

intern/cycles/render/geometry.cpp
1881–1882

That may be better, this seems like something to fix in device_update_preprocess.

2009

That sounds good.

intern/cycles/render/hair.h
143–144

The arrays not being the same size means we have to rebuild, so I think it's safe to assume that.

The only negligible difference would be that if the topology changes but the number of keys and curves stay the same, we would be unnecessarily reallocating. But that would be a coincidence and not worth checking for.

intern/cycles/render/object.cpp
917

Ok, I guess this should be fixed somehow then.

Kévin Dietrich (kevindietrich) marked 14 inline comments as done.
  • centralize device memory tagging
  • don't make device_update_flags persistent
  • remove OptiX optimizations as they useless now with the multi-bvh patch
  • make flags per manager
  • handle some remaining todos
  • remove debug code

I'm getting a crash when trying to render the 2.81 splashscreen. Happens for both the viewport and final render. Looks like ccl::PackedBVH::prim_tri_verts is uninitialized. This also occurs when using -t 1 on command line.

Stack trace:
blender.exe         :0x00007FF7AE5FF5D0  ccl::Mesh::pack_primitives D:\src\blender-git\blender\intern\cycles\render\mesh.cpp:838
blender.exe         :0x00007FF7AE561250  std::_Call_binder<std::_Unforced,0,1,2,3,4,void (__cdecl ccl::Geometry::*)(ccl::PackedBVH *,int,uns C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29333\include\functional:1440
blender.exe         :0x00007FF7AE560D00  std::_Binder<std::_Unforced,void (__cdecl ccl::Geometry::*)(ccl::PackedBVH *,int,unsigned int,bool) C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29333\include\functional:1496
blender.exe         :0x00007FF7B1C1D010  tbb::internal::function_task<std::function<void __cdecl(void)> >::execute D:\src\blender-git\lib\win64_vc15\tbb\include\tbb\task.h:1049
tbb.dll             :0x00007FFE18FF37A0  tbb::recursive_mutex::scoped_lock::internal_try_acquire
tbb.dll             :0x00007FFE18FF37A0  tbb::recursive_mutex::scoped_lock::internal_try_acquire
blender.exe         :0x00007FF7ACBBD710  tbb::internal::task_group_base::wait D:\src\blender-git\lib\win64_vc15\tbb\include\tbb\task_group.h:168
blender.exe         :0x00007FF7B1C1D710  ccl::TaskPool::wait_work D:\src\blender-git\blender\intern\cycles\util\util_task.cpp:46
blender.exe         :0x00007FF7AE56ED50  ccl::GeometryManager::device_update_bvh D:\src\blender-git\blender\intern\cycles\render\geometry.cpp:1294
blender.exe         :0x00007FF7AE56CE00  ccl::GeometryManager::device_update D:\src\blender-git\blender\intern\cycles\render\geometry.cpp:1940
blender.exe         :0x00007FF7AE513D50  ccl::Scene::device_update D:\src\blender-git\blender\intern\cycles\render\scene.cpp:271
blender.exe         :0x00007FF7AE516990  ccl::Scene::update D:\src\blender-git\blender\intern\cycles\render\scene.cpp:510
blender.exe         :0x00007FF7AE5397B0  ccl::Session::update_scene D:\src\blender-git\blender\intern\cycles\render\session.cpp:1028
blender.exe         :0x00007FF7AE538200  ccl::Session::run_cpu D:\src\blender-git\blender\intern\cycles\render\session.cpp:794
blender.exe         :0x00007FF7AE538090  ccl::Session::run D:\src\blender-git\blender\intern\cycles\render\session.cpp:879
blender.exe         :0x00007FF7AE532110  std::_Binder<std::_Unforced,void (__cdecl ccl::Session::*)(void),ccl::Session *>::operator()<> C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29333\include\functional:1496
blender.exe         :0x00007FF7B1C23800  ccl::thread::run D:\src\blender-git\blender\intern\cycles\util\util_thread.cpp:53
blender.exe         :0x00007FF7B1C23400  std::thread::_Invoke<std::tuple<void * (__cdecl*)(void *),ccl::thread *>,0,1> C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29333\include\thread:44
ucrtbase.dll        :0x00007FFE386B1430  configthreadlocale
KERNEL32.DLL        :0x00007FFE39807020  BaseThreadInitThunk
ntdll.dll           :0x00007FFE3AF9D0B0  RtlUserThreadStart

Design wise this looks good now. Mainly waiting for the failing tests and bugs to be fixed before I accept this.

intern/cycles/render/integrator.cpp
280

This should be freed when calling device free outside of device update?

  • free sample_lut_pattern when render is stopped
  • fix crash when scenes contain a mixture of mesh and hair geometries

All tests seem to be passing now.

Are there any other known issues remaining?

I know about two issues that I found working on this, but they don't seem to be related to this patch:

  1. Hair geometries are disappearing during animations when using BVH2/CUDA, I could not reproduce it when not using the Alembic procedural, so I guess the bug is there. Also I could not get a crash or assertion failure in debug builds. This seems tricky to solve.
  2. There is a buffer overflow in interactive renders when removing an object which is not the last object in Scene.geometry/Scene.object ; however this also happens in master. The issue is that Geometry.optix_prim_offset is not updated in the OptiX and Embree BVH data (through rtcSetGeometryUserData and primitiveIndexOffset respectively) because removing an object does not trigger the rebuilding/refit of the mesh BVHs, just the Scene BVH gets rebuild. For Embree this is easily solvable with no refit/rebuild, but for OptiX, I think we might have to tag for refit, as there does seem to be a way to update this primitiveIndexOffset. Geometry.optix_prim_offset is used by Embree and OptiX to report which primitive was intersected, so when removing an object, every object after it in the list has an outdated offset, most likely out of bounds.

I'm no longer getting crashes on my side but I do observe that things are not updating correctly.

Here I attempted to move the ball robot from the right of the scene to the left while within viewport render. The scene re-rendered incorrectly as it still thinks the object is on the right, but the overlay outline is clearly on the left now:

@Roger B (rboxman) what device do you use for rendering (CUDA or OptiX)? In this file I notice that whenever I move something, the CUDA is completely busted, but OptiX is fine, haven't tried Embree though.

Indeed this is straight CPU so Embree. I don't have access to a GPU capable device right this moment to try another configuration.

  • fix missing 'have_curves' flag update in the ObjectManager when curves are not modified, this requires to always execute device_update_object_transform, which is not too bad now that the heaviest computation (computing the surface area was removed)
  • fix stale bvh data when using BVH2, the arrays always have to be sent to the device as they are based on the leaf nodes and not the scene's geometry layout like OptiX and Embree
  • disable refitting top level BVH when embree is used, this was failing to take into account animated transformations, not sure if it is feasable although top level build here is not a bottleneck

Cool. I'm no longer seeing any obvious bad behavior. Though I'm not seeing any obvious perf changes either, good or bad, maybe it's just slightly faster. I've mostly been trying out viewport interactive performance using the 2.81 splash screen and moving objects - slightly - in the scene.

This revision is now accepted and ready to land.Jan 14 2021, 10:59 AM

Cool. I'm no longer seeing any obvious bad behavior. Though I'm not seeing any obvious perf changes either, good or bad, maybe it's just slightly faster. I've mostly been trying out viewport interactive performance using the 2.81 splash screen and moving objects - slightly - in the scene.

Most of the performance changes for a large scene would come from reduced data transfer to the GPU, since you are using CPU, you are missing those benefits. There are still some improvements to be made though, this patch is big already, and those are somewhat low hanging fruits, so they will be done later (just saying).

This revision was automatically updated to reflect the committed changes.