Page MenuHome

Adaptive Sampling for Cycles.
ClosedPublic

Authored by Stefan Werner (swerner) on Apr 15 2019, 1:49 PM.
Tokens
"Love" token, awarded by bartus."Love" token, awarded by Caetano."Burninate" token, awarded by TWCSwagger."Love" token, awarded by linolafett."Love" token, awarded by jc4d."Love" token, awarded by Firepal64."Burninate" token, awarded by achtrounf."Love" token, awarded by eversimo."Love" token, awarded by VirtusH."Love" token, awarded by dylanneill."100" token, awarded by Schamph."Love" token, awarded by Dangry."Burninate" token, awarded by MetinSeven."Love" token, awarded by lopoIsaac."Love" token, awarded by Dragon.Studio."Love" token, awarded by Shimoon."Love" token, awarded by intracube."Love" token, awarded by Jk."Like" token, awarded by Maged_afra."Love" token, awarded by PiloeGAO."Love" token, awarded by ponomarovmax."Love" token, awarded by 1seby."Love" token, awarded by monio."Party Time" token, awarded by sam_vh."Love" token, awarded by Oskar."Love" token, awarded by mistajuliax."Love" token, awarded by Brandon777."Love" token, awarded by 245."Love" token, awarded by lordodin."Love" token, awarded by andruxa696."Love" token, awarded by amonpaike."Burninate" token, awarded by cerebral_malfunction."Love" token, awarded by deadpin."Love" token, awarded by Ristovski."Love" token, awarded by bnzs."Love" token, awarded by franMarz."Burninate" token, awarded by hitrpr."Love" token, awarded by Treki26."Love" token, awarded by tuqueque."Love" token, awarded by Sam200."Like" token, awarded by Unreleased75."Like" token, awarded by evantryan."Yellow Medal" token, awarded by MarcelQ."Yellow Medal" token, awarded by symstract."Love" token, awarded by Forshu."Love" token, awarded by zanqdo."Love" token, awarded by jonathanl."Hungry Hippo" token, awarded by Tvaroog."Love" token, awarded by Alex108."Yellow Medal" token, awarded by duarteframos."Love" token, awarded by LucaRood."Love" token, awarded by thornydre."Burninate" token, awarded by kioku."Mountain of Wealth" token, awarded by marcog."Love" token, awarded by ogierm."Love" token, awarded by mandeep."Mountain of Wealth" token, awarded by ace_dragon."Love" token, awarded by rbx775."Like" token, awarded by helloidonthaveanyideaformyusername."The World Burns" token, awarded by filibis."Like" token, awarded by jtheninja."Love" token, awarded by julperado."Love" token, awarded by SteffenD."Love" token, awarded by irfan."Like" token, awarded by TheCharacterhero."Like" token, awarded by YAFU."Like" token, awarded by MJunk."Love" token, awarded by wo262."Party Time" token, awarded by brecht.

Details

Summary

This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"

In addition, it implements the sampling pattern described in
"Progressive Multi-Jittered Sample Sequences"

The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.

When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run peridoically over the tile buffer is used for that purpose.

After a tile is finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.

Diff Detail

Repository
rB Blender
Branch
cycles_adaptive_sampling
Build Status
Buildable 6741
Build 6741: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Brecht Van Lommel (brecht) Did you get a chance to take a look at this? Should I commit this for 2.82?

Brecht Van Lommel (brecht) requested changes to this revision.Dec 10 2019, 8:07 PM

It would be nice to get this in 2.82 still, even if there are some limitations still. Better not let perfect be the enemy of the good.

If we want to be conservative, we could make this an experimental feature for GPU rendering, though not sure it will make much practical difference since users just enable it then anyway. The release notes should at least clearly explain limitations regarding mixed device rendering.

This also will not work with rendering different sample ranges and then combining, which should also be mentioned as a limitation. That could be made to work when the aux buffer is saved in the EXR and then available as input for another rendering adding more samples, but that has deeper implications for render farms and is something to solve later if at all.

Optix support is missing, probably that is a matter of copy-pasting parts from CUDA (we would like to unify these implementations more in the future, but should be ok for now). I would not consider that a blocker for the December 12 deadline for bcon1.

The comments I made are relatively minor code structuring things, hopefully not difficult to make. You'll have to take full responsibility for handling any breaking or bug reports coming from this though, as I'll be away with little internet connection between December 13-25.

TODO:

  • Restarting sampling on a previously stopped pixel will make it pick up at a later point in the sample sequence. It should continue sampling as if it was never stopped instead.
  • Pixels that were stopped and then get restarted later by neighboring pixels may end up not getting the full sample count. This is because the main loop over a tile is still for(1...num_samples) instead of a > do...while(samples_needed). Adding this on the CPU should not be hard, it will be trickier for GPU kernels.

These should be mentioned as comments in the code somewhere.

intern/cycles/blender/addon/properties.py
377

This is missing a description about what it is, and what a lower or higher value accomplishes.

384

This description could be extended to describe in which cases you might want to increase this, which problem it solves.

385

soft_max could be 4096 here, but hard max should be the same as samples I think ((1 << 24)).

intern/cycles/blender/blender_sync.cpp
670

Ideally adding this pass should not be done in the Blender integration, but somehow automatically when adaptive sampling is enabled in the integrator.

It could be done similar to the denoising passes that do not need to be output I think.

intern/cycles/device/device_cpu.cpp
867–868

No need to use bitmask here for effiency. Better to be clear with something like:

const int adaptive_sample_step = 4;
(sample % adaptive_samples_step ==  adaptive_samples_step - 1)

Also would prefer to have this code in a separate function.

if (task.integrator_adaptive && adaptive_sample_stop(tile, sample)) {
  tile.sample = end_sample;
  break
}

Same goes for other devices.

966–988

Put in a separate function, like,

if (task.integrator_adaptive) {
  adaptive_sample_scale(tile, kg);
}

Further there seems to be some duplication of logic between devices here. I think kernel_adaptive_post_adjust could be kept private to the kernel, and instead add this in kernel_adaptive_sampling.h and use it for all devices.

ccl_device_inline void kernel_adaptive_scale_sample(KernelGlobals *kg, ccl_global float *buffer, int start_sample, int sample)
{
  if (buffer[kernel_data.film.pass_sample_count] < 0.0f) {
    buffer[kernel_data.film.pass_sample_count] = -buffer[kernel_data.film.pass_sample_count];
    float sample_multiplier = sample / max((float)start_sample + 1.0f,
                                           buffer[kernel_data.film.pass_sample_count]);
    if (sample_multiplier != 1.0f) {
      kernel_adaptive_post_adjust(&kg, buffer, sample_multiplier);
    }
  }
  else {
    kernel_adaptive_post_adjust(&kg, buffer, sample / (sample - 1.0f));
  }
}
intern/cycles/device/device_cuda.cpp
1798–1803 ↗(On Diff #14806)

It's a problem that different GPUs will have a different number of samples at which it checks the stopping criterion. That makes both CPU + GPU rendering, or rendering on a farm with CPU and GPUs not work well.

Solving this is tricky though, the bigger the tile size the more fine grained adaptive stopping we can do. Rendering multiple small tiles at once could help to some extent, but there are performance and memory usage reasons to prefer small tiles too.

I think we could commit this as is, but mention it in the release notes at least, and try to find a solution after this is committed.

intern/cycles/kernel/kernel_adaptive_sampling.h
29

In path termination we use sqrt() to approximate a typical view transform, or alternatively interpreted to be more conservative in dark areas.

I guess it's worth trying something simple like that. Doesn't have to be done for the first commit.

58

This is ok for now, but later on we should probably do some refactoring so this doesn't duplicate so much logic from kernel_passes.h, I made a note about that in T72293.

intern/cycles/kernel/kernel_random.h
53–54

Put these in kernel_types.h for use in integrator.c.

Also, knowing that the max number of samples is 4096, sample_is_even could be optimized by leaving out most of the terms, though not sure if that's performance critical.

59–61

Can you add a CMJ feature and #ifdef __PMJ__ for code like this?

Convenient for testing and maybe adaptive compilation in the future.

60–69

Put this in a pmj_sample_1D utility function.

105–118

Put this in a pmj_sample_2D utility function.

intern/cycles/render/buffers.cpp
264

Why is this specific to the combined pass, not sure why that would need to be treated different than other passes?

intern/cycles/render/integrator.cpp
71–72

I don't these should be a use_adaptive_sampling boolean here. This would force the PMJ pattern to be used, which I think is better than assuming that PMJ means adaptive sampling.

236–237

Use constants from kernel_types.h instead here.

It would be good to have a comment here explaining why this uses a fixed number of samples, and perhaps what might be done to improve it in the future.

This revision now requires changes to proceed.Dec 10 2019, 8:07 PM

Impossible (for me) to build this against current master code

If it helps @Juan Gea (juang3d), I was able to compile adaptive sampling support on the fairly latest patches of Cycles here:
https://github.com/boberfly/cycles/commit/8d378f97c2dd6c0bb21ce0f44a05637a02e5565f

I hope this one gets to squeeze into Blender 2.82! Not sure if @Stefan Werner (swerner) is on the case, but the change requests look like something I could tackle here perhaps if he hasn't got the time?

Cheers

The 2.82 ship has sailed, no more new feature commits are supposed to happen at this stage. My plan is to polish this some more, add OptiX support, maybe split it into two diffs (PMJ sampler can be separated) and then get this in properly for 2.83.

I think it's a great plan.

And with a bit of luck @Brecht Van Lommel (brecht) can review the @Lukas Stockner (lukasstockner97) patches for dithering, scrambling distance and light groups, and you can directly make it compatible with them :)

For the time being, if you rebase it let me know, anyways I'll try to apply the fix @Alex Fuller (mistaed) told me, to see if I can re-apply it to our build :)

Thanks!

In the end I've been able to apply this, I had to resolve some conflicts but it's working :)

  • Merge branch 'master' of git.blender.org:blender into cycles_adaptive_sampling
  • Merge remote-tracking branch 'origin/master' into cycles_adaptive_sampling
  • Merge remote-tracking branch 'upstream/master' into cycles_adaptive_sampling
  • Cycles: Fix Adaptive Sampling passes after AOV introduction.

This time diffing against the correct master, hopefully.

I want to get this into 2.83, hopefully I'll have time tomorrow for review.

@Stefan Werner (swerner), the comments from my previous code review have not been addressed yet.

Do you have time for that in the next ~2 weeks? Otherwise I can do it.

  • Cycles: Refactored PMJ sampler.
  • Merge commit '153e001c743bf0f6bc259966418446441e00e200^' into cycles_adaptive_sampling
  • Merge branch 'master' into cycles_adaptive_sampling

Seems the recent update went slightly wrong

intern/cycles/kernel/kernel_random.h
303

Forgotten ';' and 2 return statements below? Question for the crowd: Why not use MSVC's __popcnt intrinsic for Windows?

__popcnt on Windows is not supported on all CPUs.

  • Cycles: More adaptive sampling refactoring.
  • Fixed typo.
  • Merge branch 'master' into cycles_adaptive_sampling
Stefan Werner (swerner) marked 3 inline comments as done.Feb 25 2020, 7:39 PM

Rebase on latest master.

PMJ pattern has been committed now, seems to be working well and no
performance regressions found with GPU rendering.

From testing I see two main issues here:

  • Missing Optix support. Hopefully this is straightforward to add in the same way as CUDA, ideally sharing code.
  • OpenCL rendering seems significantly slower with adaptive sampling on. Not immediately sure what is causing this.

I'll look into both unless @Stefan Werner (swerner) does before me.

Fix OpenCL performance issue, it was always rendering 1 samples. Now it works like CUDA and also renders multiples of 4 samples, even if that means renders are not exactly reproducible.

The better solution for both is probably to increase the adaptive samples step from 4 to something higher for GPU rendering, combined with automatically rendering multiple tiles at once. But that would be for after this patch is committed.

This also refactors code to avoid hardcoded numbers.

Optix adaptive sampling, mostly shared with CUDA.

Brecht Van Lommel (brecht) accepted this revision.EditedMar 4 2020, 8:32 PM

I think this is now ready to commit, with further improvements possible in master.

I started writing adaptive sampling release notes (commented out, visible when you click edit).
https://wiki.blender.org/wiki/Reference/Release_Notes/2.83/Cycles

Images and timings are still missing, some help with that is welcome.

@Stefan Werner (swerner), feel free to commit this and edit release notes, otherwise I do it before Bcon2.

This revision is now accepted and ready to land.Mar 4 2020, 8:32 PM
This revision was automatically updated to reflect the committed changes.

It appears that I don't have a Wiki account yet, so I wasn't able to update the release notes.

@Stefan Werner (swerner), I've taken the liberty of creating one for you, should have an email about it. In case you want to edit the notes.