Page MenuHome

Fix T43835, T54284: Cycles with no ray offsetting
AbandonedPublic

Authored by William Leeson (leesonw) on Nov 13 2019, 6:08 PM.
Tokens
"Love" token, awarded by hitrpr."Like" token, awarded by IPv6."Love" token, awarded by reightar."Love" token, awarded by kaiwas."Like" token, awarded by Nikita."Love" token, awarded by 1D_Inc."Love" token, awarded by vitos1k."Love" token, awarded by Alaska."Burninate" token, awarded by Dir-Surya."Like" token, awarded by MetinSeven."Love" token, awarded by c2ba."Love" token, awarded by bintang."Love" token, awarded by juang3d."100" token, awarded by mywa880.

Details

Summary

Ray offsetting adopted in Cycles has been reported to cause various artifacts: T43835, T54284, and etc. These artifacts stand out when the scene is far from the origin or the scale of the scene is too large or too small compared to 1. In the case of instancing, the problem becomes worse because the ray offset, calculated for the world position, scale, and axis directions of the instanced object, is transformed into the object space during ray-object intersection.

There was an experiment D1212, which tried to address these problems by skipping the ray push and instead checking distance to triangle with epsilon during intersection. The result was not satisfactory.

This patch takes a different approach to tackle the problem. Instead of ray offset or epsilon, the following rules are enforced to the bvh traversal/intersection algorithms for preventing a ray from intersecting a surface that it has just departed from:

  1. A ray departing from a surface primitive does not intersect with the primitive.
  2. A shadow ray connecting a surface primitive and a light primitive does not intersect with both the primitives.

These rules are evident if the primitives are triangles. The rules are also applicable to a curve consisting of line segments if each line segment is treated as a separate primitive. In the case of a curve consisting of cardinal curve segments, the fact is utilized that a cardinal curve segment is subdivided into a piecewise line segment on the fly for finding ray-curve intersection. The subdivided line segment is treated as a separate primitive, which allows a ray to intersect with(be occluded by) the same cardinal curve it departed while prohibiting it from intersecting to the same departure point. (It turns out that the ray-curve intersection point is refined later to that on a real cardinal curve, which already gives enough offset. However, as subdivision level is raised, a gap between a piecewise linear approx. and a real curve becomes reduced. On the other hand, shadow occlusion check is still done against the linear approx. Therefore, I conclude that excluding a tiny curve parameter space including the line segment while searching intersection would do good with little harm.)

What if a ray hits a boundary of two primitives? For that case, this patch does nothing because the probability that a ray falls on the range of numerical errors between two adjacent primitives should be very very low and would not contribute to the render result. Otherwise, the tessellation itself is problematic: the primitives are either several orders of magnitude larger than the current view frustum or too small that their sizes are already comparable to floating point precision errors.

A ray skipping a departing primitive alone suffers when there is another primitive overlapping it, which causes unpredictable back and forth of ray between two primitives and results in unobtrusive visual artifacts. This patch implements a novel method for coping with the problem. A tight ray start time(tstart) is estimated for each ray before bvh traversal to cull overlapped meshes and remove artifacts, with the same math used by cycles internal bvh traversal/ray intersection routines. Custom implementations ( and perhaps extra APIs ) are required to make this (estimation of ray start time) work for external path tracing libraries such as NVIDIA Optix and Intel Embree. Without a ray start time estimation, however, self-intersections are still prevented by the rules above.

Comments on struct Ray, struct Intersection, and struct PathState in 'kernel_types.h' explain how this patch works. All the other modifications are just a bookkeeping for applying the rules.

The following results will demonstrate the effectiveness of this patch.

T43835

BeforeAfter

T54284

BeforeAfter

@Brecht Van Lommel (brecht) 's test set of different scales and origins
blender file:

BeforeAfter
Size 1
Size 1e-3
Size 1e-5
Size 1e3
Size 1e5
Origin 1e3
Origin 1e5

This patch is not intended just as a proof of concept. The following are benchmark results of the demo scenes:

Windows 10 64bit
GPU: NVIDIA GeForce RTX 2080 SUPER
CPU: Intel(R) Core(TM) i7-9700

'Cosmos Laundromat Demo'

BeforeAfter
Image
Optix (tiles 64x64)05:25.4605:24.05
CUDA (tiles 64x64)07:34.4507:43.73
CPU (tiles 32x32)38:09.9238:26.68

'Agent 327 Barbershop'

BeforeAfter
Image
CUDA (tiles 64x64)08:42.4808:53.60
CPU (tiles 16x16)24.42.4324.01.31

'Spring'

BeforeAfter
Image
CPU (tiles 16x16)25:03.6525:01.07

2019.12.15 : All benchmark results are updated.

Diff Detail

Repository
rB Blender

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)Nov 29 2019, 1:29 PM

Commented with some nitpicks on the OptiX side of things. Would prefer not to consume more payload registers (those come at a cost and accessing pointers through them especially), but it may not be possible to avoid that.

It seems like it is not necessary to store all of src_prim, src_object and src_type in the ray though. The primitive index (src_prim_index) should uniquely identify any primitive, so just storing and comparing that should be enough? I also don't believe it would hurt much to just look up src_type in place from __prim_type where necessary (since it's only really used for curves), but haven't benchmarked. Doing so could make it possible to avoid passing in the ray pointer through the payload and instead just pass through a primitive index that should be avoided during intersection.

intern/cycles/kernel/kernel_types.h
1024

OBJECT_NONE

intern/cycles/kernel/kernels/optix/kernel_optix.cu
262

This anyhit program is never executed for anything but triangles (curves are added to the AS with the OPTIX_GEOMETRY_FLAG_DISABLE_ANYHIT and all traces that force an anyhit use a different anyhit program).

308

This is not necessary: The intersection program can never be called for a primitive that is not a curve, since that is the only existing custom primitive type and triangles by definition cannot go through this.

321

Some spaces sneaked in here and below =P

Thank @Patrick Mours (pmoursnv) for your advice. I have practically zero knowledge on OptiX.

On the cycles bvh side, src_object is needed for an instanced object and src_prim_index alone does not identify it. Although src_prim can be looked up with src_prim_index, different prim_indices map to a same prim, which is coupled with object id to identify a primitive. Without storing src_prim, table lookup will be repeated to find it later. Also src_type now stores a flag at its highest bit, which indicates whether the object is pre-transformed/instanced or not. This information is needed later to calculate a tight ray start time for a new ray direction for culling overlapping meshes. Although all these info may be found later except src_object, my intent was to reduce repeated table lookup and recycle already found info on the cycles side.

Now, I have a question. Does OptiX currently provide APIs for implementing ray_update_tstart()? The calulated ray start time ( or epsilon ) by ray_update_tstart() must reflect the math inside OptiX bvh traversal algorithms accurately to cull overlapping meshes and offset the ray from the departing primitive consistently due to 'floating point determinism'.

It looked like ray_update_tstart was only called in places where the ray was already set up, so the type data was available and could e.g. be passed in as another argument to the function. But really I was only searching for ways to avoid additional payload registers.

And you are right, I was looking at it from the Cycles OptiX implementation and there the primitive index is a one-to-one mapping, but that is not the case for Cycles own BVH. It may be possible to take advantage of this for OptiX though. I can experiment a bit to see if there would be any advantage.

As for a triangle_update_tstart implementation for OptiX: OptiX has no such API currently. And it is made more complicated by the fact that intersection precision may vary depending on which GPU generation is used and how the AS was build (e.g. a motion blur AS can behave differently from a non-motion one). I also don't think it would be easy to provide such an API because of how the hardware works, but I'll raise this with the team here to check.

Implementation of triangle_update_tstart() in OptiX may be easier than it looks. Currently object inv transform matrices needed are all supplied from outside, not calculated inside the function. (The calculation code inside is for contingency and also serves as specification.)

On the other hand, since a ray has all information to calculate triangle_update_tstart() now, it may also be possible to calculate it inside OptiX bvh traversal modules, not exposing the value outside.

External path tracing libraries now skip triangle_update_tstart() because the function does not work with them at all.
Some OptiX code is modified not to check unnecessary conditions. Optimization for OptiX is still left to experts.

Ha Hyung-jin (robobeg) updated this revision to Diff 19995.EditedDec 1 2019, 3:34 PM
Ha Hyung-jin (robobeg) marked an inline comment as done.

For an AVX2-optimized CPU kernel, bvh traversal routines are set to use triangle_intersect() temporarily instead of triangle_intersect8() for rays with tstart updated.

Numerical difference between float calculation with and without AVX2 is somehow expected and a proper solution would be to implement an AVX2 version of triangle_update_tstart() for primitives supposed to be handed over to triangle_intersect8(). The problem is that among obvh intersection routines, some use triangle_intersect8() while others still use triangle_intersect().

Ha Hyung-jin (robobeg) updated this revision to Diff 20023.EditedDec 2 2019, 8:44 PM

In addition to triangle_update_tstart(), motion_triangle_update_tstart() is implemented to calculate a tight ray start time for motion triangle primitives.
I have completely forgotten triangle vertex motion.

Ha Hyung-jin (robobeg) updated this revision to Diff 20055.EditedDec 3 2019, 10:51 PM

For external path tracing libraries not supporting a feature for estimation of a tight ray start time at the moment, ray_offset() is restored as a fallback. However, self-intersection is still guaranteed to be prevented by this patch for them. Only functions ray_offset() and ray_update_tstart() need to be modified when those libraries are ready to support the feature.

Now this patch has achieved: prevention of self-intersection of ray-triangle, no glitches caused by ray offsetting, and reduced artifacts of overlapping meshes on a par with ray offsetting. From this patch, artists will get only benefits with no extra burdens.

For AVX2-optimized CPU kernels, bvh traversal routines are set to use triangle_intersect() temporarily instead of triangle_intersect8() for rays with a tight ray start time estimated, because two functions return slightly different results while bvh routines use both functions intermixed. One quick solution is to choose a greater value after estimating both tight ray start times with/without AVX2 optimzation but it increases unnecessary overhead. Besides, I also found some negative zero issues in triangle_intersect8(). I plan to deal with these as a separate patch.

Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)Dec 9 2019, 12:34 PM
Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)

Update the patch to be appliable to the latest revision.

I want to try including this for 2.83.

I'll run some CPU and GPU benchmarks tonight.

I think it should be possible to extend this to fix the problems with overlapping geometry too by implementing the algorithms from "Robust Iterative Find-Next-Hit Ray Traversal"
http://www.sci.utah.edu/~wald/Publications/2018/nexthit/nexthit-pgv18.pdf

Instead of sorting hits only by distance t, we can sort them also based on object and primitive ID, guaranteeing a consistent order of traversal even in the overlapping case. Since the ray's payload carries object and primitive ID of the object from which the ray originated, it shouldn't be too hard to add. This would also eliminate the need for ray_update_tstart() and could thus simplify the code.

The performance impact of this for GPU rendering is not ideal, I think we'll have to spend some more time to see if we can optimize especially that. We can accept some slowdown, but would rather have it in the 0-2% range than up to 6.7%. I'm not sure if that's possible though, I haven't analyzed the implementation closely.

AMD Ryzen 2990WX, Ubuntu Linux

bmw27-0.8%
classroom0.1%
fishy_cat0.5%
koro1.9%
pabellon-0.4%

CUDA Quadro RTX 5000, Ubuntu Linux

bmw271.1%
classroom-0.2%
fishy_cat3.5%
koro4.5%
pabellon0.9%

Optix Quadro RTX 5000, Ubuntu Linux

bmw276.7%
classroom3.8%
fishy_cat6.1%
koro1.3%
pabellon5.5%

Sorry, I left this thread for a while because there were no responses. Are reviewers still interested in this patch? Are there any technical issues other than performance?
I am currently busy in finishing a company project, so I may get some free time only after a couple of months.

intern/cycles/kernel/kernel_emission.h
218

removed.

222

removed.

intern/cycles/kernel/kernel_path_surface.h
396

removed.

intern/cycles/kernel/kernel_types.h
1024

fixed.

intern/cycles/kernel/kernels/optix/kernel_optix.cu
262

fixed.

308

removed.

321

removed.

Any chance to see this in master anytime soon? Its 21 century and it is quite uncomfortable to move scene to "world center" just to render a scene without artefacts :)
Just hit a problem with black lines when mesh is far from center (although it was only 100m, not so far in fact) - https://developer.blender.org/T87301

Is there any movement on this?

@Brecht Van Lommel (brecht) I know you have concern over performance regressions, but I'd rather it be correct and slow than incorrect and fast. I haven't been able to work around the artefacts (https://developer.blender.org/T85057) except for in some special cases.

Is there any movement on this?

@Brecht Van Lommel (brecht) I know you have concern over performance regressions, but I'd rather it be correct and slow than incorrect and fast. I haven't been able to work around the artefacts (https://developer.blender.org/T85057) except for in some special cases.

Sorry, I was the original author of this differential. When I suggested this, I had some free time. But, nowadays I have been too busy to work on this.
Besides, there seem to remain few things I may do additionally. I think the original differential works but moderators seem to consider that it drops the rendering performance and they also seem to be reluctant to provide it as an option.

I'd like to see if we can get this patch into cycles-x. I was wondering if you would help out with this?

Hi All, I have been working on getting this working in cycles-x and have achieved some success.




This currently only works for triangle meshes as it uses the object and primitive ids. The offset algorithm needs a bit of fixing still, particularly along triangle seams also I currently only have GPU, CPU but not Optix working yet. I'll post a diff after I clean up the code as it is quite messy ATM. Also, I am wondering if it is best to create an new diff instead of altering this one?

Also, I am wondering if it is best to create an new diff instead of altering this one?

If it's a completely changed implementation (which is likely due to cycles-x changes), it's best create a new diff. Still I guess you will reuse code and ideas from this patch.

Yes it's just based on this as lots has changed in cycles-x so I'll create a new patch.

I have been working on fixing the static scenes w.r.t. removing ray offsetting with some success. However, as can be seen in the images below adjacent triangles in the mesh have some issues that need looking at. Also at large sizes precision problems start occurring which also need some work.

origin1e-3origin1e-5origin1e5Black LinesGlitch
original
CUDA static
CUDA Error
Optix static
CUDA Error

Dynamic scenes were largely working last week with similar issues but they rendered glitch on CUDA and optix just fine. I need to investigate why this has started occurring with the static scene solution.

This has been implemented already, closing it.