Details
Diff Detail
- Repository
- rB Blender
- Branch
- master
- Build Status
Buildable 3309 Build 3309: arc lint + arc unit
Event Timeline
@Mohamed Sakr (msakr) , just commandeer it, and click edit revision in the upper right to fill out the details and proper title
denoiser was crashing in OSX (latest Cycles standalone from 14/2/2019), it was doing negative indices randomly, behavior is undefined.
in my case the test scene was the default cube, light scene.
800x600 image
denoiser radius = 8
CPU tile = 64x64
example:
line 168,
idx_q = (y+dy)*stride + (x+dx);
y = 5, dy = -5, x = 5 (lowered to 4 by the round_down(rect.x, 4)), dx = -5
idx_q = -1, crash on the next load
float4 val = load4_u(image, idx_q);
@Stefan Werner (swerner) confirmed the negative index
This seems like a workaround more than a fix for the real issue.
I would expect it to only loop over pixels within bounds, not use zero for pixels outside of bounds.
Can you figure out why the indices are negative in the first place?
TRUE, I just did this for a fast fix before Cycles 4D release (was yesterday and the bug was a day before), and I didn't want to break the denoiser function (algorithm) as I dunno what is happening.
it gets the bounds (x, y, dx, dy) before entering that function from here
https://developer.blender.org/diffusion/C/browse/master/src/device/device_cpu.cpp$504-507
so that rect is actually wrong (I did a fast fix first time by limiting the dx like this):
on https://developer.blender.org/diffusion/C/browse/master/src/device/device_cpu.cpp$506
dx = (dx < 0) ? round_down(dx, 4) : dx;
but it crashed in one of these kernels as it rounds more stuff from dynamic data (the f parameter)
@Lukas Stockner (lukasstockner97) should know more about the right way to fix.
in 1 of the tiles, here, https://developer.blender.org/diffusion/C/browse/master/src/device/device_cpu.cpp$504-507
dy = -5. dx = -5, local_rect = {5, 5, 80, 80) (xyzw)
in the kernel, x + dx is intended for 0
all of the bugs in all cases happen because the bounds are rounded somewhere (x is rounded to lower 4) so the 5 will be converted to 4
4 + -5 = -1, crash.
all lines that got round_down in that file (6 lines) needs to be reworked so it ensures that it doesn't go below 0.
I'm just going ahead and updating this with my fix, the history should still be there so we could go back to your fix.
Essentially, all of the aligning was there for performance, but I looked into this a bit further and it turns out that nowadays aligned vs. unaligned doesn't really matter anymore, so my fix is just to remove all the aligned accesses.
The resulting patch is unfortunately quite large since I needed to tweak the buffer row padding (apparently, the aligning before was hiding this issue) and therefore need to pass an additional parameter to a lot of kernels.
I'm still not happy with the manual vectorized code, the readability is not exactly great. It'd be great if auto-vectorization could deal with that for us, but that does not seem to be the case currently. I suppose something like ISPC could help here, but that's a bit overkill...
Also, I couldn't test the GPU code yet (no functional changes, but there's always the possibility of messing up things like argument order), I'll try to do that tomorrow.
I tested this in a recent build in Cycles 4D (cloned from Cycles Standalone), it still crashes on OSX CPU, 8 cpu threads,