Page MenuHome

Cycles: Refactored parts of the texture sampler and resolved CPU/GPU discrepancy.
ClosedPublic

Authored by Ethan Hall (Ethan1080) on Mar 9 2022, 11:03 PM.

Details

Summary

I have found the reason for the CPU/GPU discrepancy. The discrepancy occurs because the CPU code for cubic interpolation with clip texture extension only performs texture interpolation inside the range of [0,1]. As a result, even though the volume's color is sampled using cubic interpolation, the boundary is not being interpolated. The GPU appears to be interpolating samples that span the clip boundary softening the edge.

This patch resolves the CPU/GPU discrepancy. I compared performance and render results using the smoke_color.blend test file. Cubic interpolation is used for interpolating the volume's color information for all examples below.

Tests were performed on smoke_color.blend Resolution: 1024x1024 Samples: 64

MethodRender Time - Average of 5 RunsHow it looks in the viewport
Original Code - 3D Texture is not interpolated across the clip border24.9 seconds
Patch - 3D Texture is interpolated across the clip border25.1 seconds
Original Code (GPU) - 3D Texture is interpolated across the clip border1.8 seconds

I was working on D14237, and I realized I had to break it up into smaller self contained patches.
This patch is meant to help make future development of the Cycles texture sampling code easier.

The future features that I want to implement are:

  • Independent axis extensions for X/U and Y/V
  • Mirrored repeat texture extension

I made these changes with those goals in mind.

With this patch, Cycles passes all but one test. The Cycles CPU volume smoke_color test now technically fails. However, the new CPU result for this test is now very similar to the GPU result. This patch could resolve the "uninvestigated discrepancy" that put this test on the blacklist for the Cycles GPU tests.

I'm happy to hear any feedback on how to improve this patch.

Diff Detail

Repository
rB Blender

Event Timeline

Ethan Hall (Ethan1080) requested review of this revision.Mar 9 2022, 11:03 PM
Ethan Hall (Ethan1080) created this revision.
Ray Molenkamp (LazyDodo) requested changes to this revision.Mar 9 2022, 11:20 PM

There's also a fair bit of shuffling the order of things around without a clear functional reason, what's the story there?

intern/cycles/util/texture.h
9 ↗(On Diff #49121)

You can't do this, this file won't be available for cycles standalone, even if it was available though including a GPL licensed header in a apache 2.0 licensed code file is problematic

This revision now requires changes to proceed.Mar 9 2022, 11:20 PM
intern/cycles/util/texture.h
9 ↗(On Diff #49121)

Yeah, I'll revert that.

There's also a fair bit of shuffling the order of things around without a clear functional reason, what's the story there?

I was taking my time typing it all out.

Part of the reason for the "shuffling" (that may not be totally clear from my explanation) is, I wanted to group the x samples together and the y samples together so that they can be separated into their own switch statements like in D14237.

Ethan Hall (Ethan1080) edited the summary of this revision. (Show Details)
  • Removed problematic header.
  • Removed changes to focus the patch on only modifying the Cycles CPU code.
Ethan Hall (Ethan1080) edited the summary of this revision. (Show Details)Mar 11 2022, 6:04 AM
Brecht Van Lommel (brecht) requested changes to this revision.Mar 11 2022, 2:37 PM

Have you tried measuring the performance difference due to this change?

The previous logic was used because it's faster, I wonder if this makes a significant difference in something like the simple test in the description.

This revision now requires changes to proceed.Mar 11 2022, 2:37 PM

@Brecht Van Lommel (brecht) You are correct. This patch needs to be optimized. It slows things down way too much. I am looking into it.

Ethan Hall (Ethan1080) edited the summary of this revision. (Show Details)Mar 12 2022, 12:45 AM

It's not clear to me which of the methods shown corresponds to your preferred method, and what all of them do exactly. I suggest to update the patch with your preferred method so it's clear what you're proposing exactly.

@Brecht Van Lommel (brecht) My preferred method is "Optimized Patch - Cubic With Cubic Clip" since it matches more closely with the GPU result and how 2D interpolation handles the clip boundary. I will upload the new revision as soon as I de-duplicate the code. (I duplicated a few sections of code in order to test it quickly.)

@Brecht Van Lommel (brecht)
Here, I circled the location where the difference is most apparent:

Optimized the Patch

Ethan Hall (Ethan1080) retitled this revision from Cycles: Refactored parts of the texture sampler code to support further feature development and resolve CPU/GPU discrepancy. to Cycles: Refactored parts of the texture sampler and resolved CPU/GPU discrepancy..Mar 14 2022, 4:00 AM
Ethan Hall (Ethan1080) edited the summary of this revision. (Show Details)

Fixed edge clip bounds for closest interpolation.

  • Changed order of read clip conditional evaluations.

The render time differences were removed from the table. Are they still the same as before (25s -> 29s)?

@Brecht Van Lommel (brecht) No, the new times are:
Master ~25.2s
Patch ~24.7s

Ethan Hall (Ethan1080) edited the summary of this revision. (Show Details)Mar 15 2022, 3:13 PM
Brecht Van Lommel (brecht) requested changes to this revision.Mar 15 2022, 5:30 PM

There seem to be some test failures with 2D images:

This revision now requires changes to proceed.Mar 15 2022, 5:30 PM
  • Fixed errors leading to failed tests.
Ethan Hall (Ethan1080) edited the summary of this revision. (Show Details)Mar 16 2022, 12:01 AM

I'll commit this with some minor changes left out that I don't think are necessary, and some code style tweaks.

This revision is now accepted and ready to land.Mar 21 2022, 4:41 PM