Page MenuHome

2.90 : Large Denoise Data rendering time regression (CPU)
Closed, ResolvedPublicBUG

Description

System Information
Operating system: Windows-10-10.0.18362-SP0 64 Bits
Graphics card: GeForce GTX 1070 with Max-Q Design/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 446.14

Blender Version
Broken: version: 2.90.0 Alpha, branch: master, commit date: 2020-07-10 20:07, hash: rBd2b910fafec6
Broken: version: 2.90.0 Alpha, branch: master, commit date: 2020-06-26 17:24, hash: rBb7b57e7155ee
Worked: 2.83.0

Short description of error
There seems to be an extremely large rendering time regression if the "Denoise Data" rendering pass is enabled; regardless if it's used or not.

VersionTime (sec)Denoise Data enabledResult
2.83.00.2NoOk
2.83.05.7YesOk
2.90.00.2NoOk
2.90.048.4YesNot Ok - this bug

Note: GPU does not seem affected by the regression so it would seem to point toward the Embree integration as being the primary culprit. Embree might be slower for simple scenes but it's surprising that Denoise Data is affected to such an extent when the base render still seems quick.

Exact steps for others to reproduce the error

  • Load attached .blend
  • Compare rendering times with and without the "Denoise Data" option enabled in the render layer

Event Timeline

Jesse Yurkovich (deadpin) renamed this task from 2.90 : Large Denoise Data rendering time regression to 2.90 : Large Denoise Data rendering time regression (CPU).Jul 12 2020, 11:45 PM
Jesse Yurkovich (deadpin) updated the task description. (Show Details)
Alaska (Alaska) changed the task status from Needs Triage to Confirmed.Jul 13 2020, 1:19 PM
Alaska (Alaska) triaged this task as High priority.
Alaska (Alaska) changed the subtype of this task from "Report" to "Bug".
Alaska (Alaska) added a subscriber: Alaska (Alaska).

Can confirm.

Windows-10-10.0.19041
Ryzen 9 3900X
Blender 2.90.0 Alpha, branch: master, commit date: 2020-07-12 09:05, hash: rBf319eec88186

However, I can not reproduce this issue on Linux. So marking this as a Windows issue.

Linux-5.7.0-1-amd64-x86_64-with-debian-bullseye
Ryzen 9 3900X
2.90.0 Alpha, branch: master, commit date: 2020-07-12 09:05, hash: rBf319eec88186

Intel Core i7 950 @3.07GHz
Windows 10 Pro 1909 18363.900
2.90.0 rB70992ae27027

VersionTime (sec)Denoise Data enabledDenoising Drop DownRender checkbox
2.83.20.53NoNA (NLM Only)Off
2.83.213.66YesNA (NLM Only)Off
2.83.21:13.17YesNA (NLM Only)On
2.83.21:09.70NoNA (NLM Only)On
2.90.00.94NoEitherOff
2.90.01:47.97YesNLMOff
2.90.00.97YesOpenImageDenoiseOff
2.90.01:13.48YesOpenImageDenoiseOn
2.90.012.26*YesOpenImageDenoiseOff

*Time includes the time for the compositor to run the OIDN node

So the good news is that rendering in 2.90.0 with Denoise Data enabled, OpenImageDenoise selected and Render unchecked, and using the OIDN node in the compositor is faster than in 2.83.2.
Although OpenImageDenoise is enabled in 2.90.0 to be used during rendering, it is still way faster to use the compositor node, at least on my old CPU with device capabilities: SSE2 SSE3 SSE41 This is already known: See T76259

In 2.90.0, when NLM is selected, all 6 Denoising passes are anti-aliased. When OpenImageDenoise is selected, the three Denoising passes (Normal, Albedo, and Depth) are not.
However, in 2.83.2, all 6 Denoising passes are anti-aliased, so that hasn't changed between versions. Could it be Blender's implementation of the AA algorithm is slower now with Embree and TBB?

What's up with the embree witch hunt? Sure there is a small regression on the default cube scene, but it seems somewhat unfair to unjustly just blame it for all regressions.

Quick gander with a profiler revealed disabling the optimizer for fast_exp2f4 in T78047 seems to be the root culprit here (well that and the move to C++17 which triggered the codegen bug in the first place)

upstream has a recommendation to sidestep the issue by disabling the SSAOptimizer with an undocumented compiler switch, however i'm unsure if this will have other performance regressions, so no super fond of this solution.

Given cycles doesn't seem to be using any C++17 features yet, another option is to just use the C++14 mode for bf_cycles_kernel

BuildTime to renderPasses denoise tests
master rBa44299ccd11768.08Yes
Revert rB5cfbc722d09510.88No
/d2SSAOptimizer-11.04Yes
/std:C++1411.04Yes

cc:@Brecht Van Lommel (brecht), @Sergey Sharybin (sergey)

Thanks all for the tests, I went for a solution that doesn't change the compiler flags, but instead does not use SIMD for this function. Difference is only about 1-2% in my tests.