System Information
Operating system: any
Blender Version
Broken: (master, as of now, rB35c707684befb2fe823ab1bfa002b34785c31841), intern/cycles/kernel/filter/filter_nlm_cpu.h
Short description of error
When doing denoising sometimes cycles might read values that are out of memory buffer
Exact steps for others to reproduce the error
Issue lies in methods:
kernel_filter_nlm_calc_difference, method, part: int aligned_lowx = rect.x & (~3);
kernel_filter_nlm_update_output, method, part: int aligned_lowx = round_down(rect.x, 4);
This happens when aligned_lowx + dx becomes less than zero.
For example if method get rect.x = 5 value and dx is -5, then after rounding down aligned_lowx becomes 4, and after that aligned_lowx + dx becomes -1, then when reading image we get something like this load4_u(image, -1):
ccl_device_inline void kernel_filter_nlm_update_output(int dx, int dy, ..., int4 rect, ...) {
nlm_blur_horizontal(difference_image, temp_image, rect, stride, f);
int aligned_lowx = round_down(rect.x, 4); // Becomes 4
for (int y = rect.y; y < rect.w; y++) {
for (int x = aligned_lowx; x < rect.z; x += 4) {
int4 x4 = make_int4(x) + make_int4(0, 1, 2, 3);
int4 active = (x4 >= make_int4(rect.x)) & (x4 < make_int4(rect.z));
int idx_p = y * stride + x, idx_q = (y + dy) * stride + (x + dx); // idx_q - becomes -1
float4 weight = load4_a(temp_image, idx_p);
load4_a(accum_image, idx_p) += mask(active, weight);
float4 val = load4_u(image, idx_q); // we try to read image at position -1
if (channel_offset) {
val += load4_u(image, idx_q + channel_offset);
val += load4_u(image, idx_q + 2 * channel_offset);
val *= 1.0f / 3.0f;
}
load4_a(out_image, idx_p) += mask(active, weight * val);
}
}
}Possible fix would be (but I am not sure if it is correct in case of denosing algorithm):
int aligned_lowx = round_down(rect.x, 4);
if (aligned_lowx + dx < 0) {
aligned_lowx += 4;
}Not sure I can reproduce easily as I was cross-compiling cycles and then running it on ARM.