Page MenuHome

Fix: BLI_expr_pylike_eval_test test failing with clang on windows
ClosedPublic

Authored by Ray Molenkamp (LazyDodo) on Oct 7 2019, 6:41 PM.

Details

Summary

Building with clang on windows BLI_expr_pylike_eval_test fails due
to an inliner bug in clang 9.0.0.

Cause:

in the function parse_add_func in source/blender/blenlib/intern/expr_pylike_eval.c

we have the following snipppet of code in the OPCODE_FUNC2 case [1]

double result = func(prev_ops[-2].arg.dval, prev_ops[-1].arg.dval);

if (fetestexcept(FE_DIVBYZERO | FE_INVALID) == 0) {

build with Ob1 this generates the following code

00007FF7BA8CA7A7  movsd       xmm0,mmword ptr [rbx-18h]
00007FF7BA8CA7AC  movsd       xmm1,mmword ptr [rbx-8]
00007FF7BA8CA7B1  call        r14
00007FF7BA8CA7B4  movaps      xmm6,xmm0
00007FF7BA8CA7B7  mov         ecx,18h
00007FF7BA8CA7BC  call        fetestexcept (07FF7BA8B6F23h)

Ob2 takes it in another direction and does

00007FF7BA8CAF90  movsd       xmm7,mmword ptr [rdi-18h]
00007FF7BA8CAF95  movsd       xmm8,mmword ptr [rdi-8]
00007FF7BA8CAF9B  mov         ecx,18h
00007FF7BA8CAFA0  call        fetestexcept (07FF7BA8B6F23h)

It somehow forgets to call the function doing the actual work
soo yeahhh, that's not great.

This patch just switches to /Ob1 for clang since I have no idea
what other locations not covered by our tests this bug will rear
its ugly head.

Option B is tagging a BLI_NO_INLINE on parse_add_func

[1] https://developer.blender.org/diffusion/B/browse/master/source/blender/blenlib/intern/expr_pylike_eval.c$523

Diff Detail

Repository
rB Blender

Event Timeline

Ray Molenkamp (LazyDodo) edited the summary of this revision. (Show Details)Oct 7 2019, 6:47 PM

I tried to extract a repro and i'm kinda stumped only msvc seems to handle this correctly, both llvm and gcc get overly aggressive.

//test.c
#include <stdbool.h>
#include <stdio.h>
#include <fenv.h>

#ifdef _MSC_VER
#  pragma fenv_access(on)
#else
#  pragma STDC FENV_ACCESS ON
#endif

typedef double(*BinaryOpFunc)(double, double);

static bool parse_add_func(volatile double *values, void *funcptr)
{
	feclearexcept(FE_ALL_EXCEPT);
	BinaryOpFunc func = funcptr;
	double result = func(values[0], values[1]);
	if (fetestexcept(FE_DIVBYZERO | FE_INVALID) == 0) {
        // values[0] = result; //#1
		return true;
	}
	return false;
}

static double op_div(double a, double b)
{
	return a / b;
}

int main()
{
	volatile double values[] = { 0.0, 0.0 };
	bool res = parse_add_func(values, op_div);
	printf("%d expected 0\n", res);
	values[1] = 1.0;
	res = parse_add_func(values, op_div);
	printf("%d expected 1\n", res);
}

(these should both should build on linux as well)

binary:
gcc: gcc -O2 test.c -o test -lm
clang: clang -O2 test.c -o test

asm:
gcc: gcc -S -O2 -masm=intel test.c
clang: clang -S -O2 -masm=intel test.c2

GCC:

	push	rbx
	mov	rbx, rdi
	mov	edi, 61
	call	feclearexcept
	mov	edi, 5
	movsd	xmm0, QWORD PTR [rbx+8]
	movsd	xmm0, QWORD PTR [rbx]
	call	fetestexcept
	test	eax, eax
	sete	al
	pop	rbx
	ret

clang: clang inlines the whole thing into main but performs the same optimization

seh_proc main
# %bb.0:
	sub	rsp, 56
	mov	qword ptr [rsp + 48], 0
	mov	qword ptr [rsp + 40], 0
	mov	ecx, 31
	call	feclearexcept
	movsd	xmm0, qword ptr [rsp + 40] # xmm0 = mem[0],zero
	movsd	xmm0, qword ptr [rsp + 48] # xmm0 = mem[0],zero
	mov	ecx, 24
	call	fetestexcept
	xor	edx, edx
	test	eax, eax
	sete	dl
	lea	rcx, [rip + "??_C@_0P@KNLCGADH@?$CFd?5expected?50?6?$AA@"]
	call	printf

both GCC and Clang seem to realize that the result variable in parse_add_func is not used and skip the div_op all together.

When you uncomment #1

GCC smartens up and does the right thing

	push	rbx
	mov	rbx, rdi
	mov	edi, 61
	sub	rsp, 16
	call	feclearexcept
	movsd	xmm1, QWORD PTR [rbx+8]
	movsd	xmm0, QWORD PTR [rbx]
	call	op_div
	mov	edi, 5
	movsd	QWORD PTR [rsp+8], xmm0
	call	fetestexcept

clang doesn't get smarter and still does the wrong thing.

	mov	qword ptr [rsp + 32], 0
	mov	qword ptr [rsp + 40], 0
	mov	ecx, 31
	call	feclearexcept
	movsd	xmm6, qword ptr [rsp + 40] # xmm6 = mem[0],zero
	movsd	xmm7, qword ptr [rsp + 32] # xmm7 = mem[0],zero
	mov	ecx, 24
	call	fetestexcept
	xor	esi, esi
	mov	edx, 0
	test	eax, eax

changing the function call to

volatile double result = func(values[0], values[1]);

helps clang finally see the light as well

	movsd	xmm6, qword ptr [rsp + 40] # xmm6 = mem[0],zero
	movsd	xmm7, qword ptr [rsp + 32] # xmm7 = mem[0],zero
	mov	ecx, 24
	call	fetestexcept
	xor	esi, esi
	mov	edx, 0
	test	eax, eax
	jne	.LBB0_2
# %bb.1:
	divsd	xmm7, xmm6
	movsd	qword ptr [rsp + 32], xmm7
	mov	edx, 1
.LBB0_2:
	lea	rcx, [rip + "??_C@_0P@KNLCGADH@?$CFd?5expected?50?6?$AA@"]

Given both clang and gcc seem to have edge cases here where the behavior is undefined at best, and all compilers do the right thing when you tag on volatile I propose to solve
this issue rather than limiting clangs optimizer in the patch above by

changing

double result = func(prev_ops[-2].arg.dval, prev_ops[-1].arg.dval);

to

volatile double result = func(prev_ops[-2].arg.dval, prev_ops[-1].arg.dval);

in parse_add_func

given this is campbell's code, I added him as reviewer

Basically it sounds like the optimizers in the compilers don't realize that fetestexcept should be a barrier to floating point operation reordering. This is somewhat similar to a missing barrier in concurrency programming, so volatile could be called a valid fix.

volatile sounds like a good fix.

use volatile rather than telling clang /Ob1

source/blender/blenlib/intern/expr_pylike_eval.c
507–509

Do it here as well, in case a small change somewhere triggers the same problem here.

  • add volatile to the func1 case as well
Ray Molenkamp (LazyDodo) marked an inline comment as done.Oct 8 2019, 3:50 PM
This revision is now accepted and ready to land.Oct 8 2019, 3:50 PM