Page MenuHome

Adding support for atomic_load/atomic_store
Needs ReviewPublic

Authored by YimingWu (NicksBest) on May 25 2022, 3:39 AM.

Details

Summary

This patch adds atomic_load / atomic_store support for different types into intern/atomics.

Originally I used some weird C11/C++11 atomics support to get the compiler to pass, but later Sergey came up with a more generic solution which uses GCC built-in functions, and this is tested to be working as it should.

Diff Detail

Repository
rB Blender

Event Timeline

YimingWu (NicksBest) requested review of this revision.May 25 2022, 3:39 AM
YimingWu (NicksBest) created this revision.
YimingWu (NicksBest) edited the summary of this revision. (Show Details)

Fixed for C++ and looks like all tests passed :)

[ RUN      ] atomic.atomic_load_z
[       OK ] atomic.atomic_load_z (0 ms)
[ RUN      ] atomic.atomic_store_z
[       OK ] atomic.atomic_store_z (0 ms)

After closer inspection I am not sure why we need all the complexity of different code paths for C++ and C. My guess is that it is to by-pass stupidity of stdatomic.h which is not C++-compliant in GCC. The question is then: why not to use built-in functions which do not require C11? The atomic_load and atomic_store are merely wrappers of the built-in functions mentioned above.

Here is a quick implementation in a way which I believe is much more straightforward: P2975

The snippet misses x64 fallback and a locking codepaths. The locking codepaths is something very simple to implement following example of the ATOMIC_LOCKING_CAS_DEFINE.

The x64 is where things becomes interesting. From my understanding, unless we allow doing something crazy like using those functions for values with a weird alignment (cast an offset of 3 uint64 i.e.) both store and load are atomic and no extra memory fencing is required. I'd suggest that we only allow atomic functions to be used on variables which follow the compiler alignment rules as it will avoid unneeded fencing. This is something we can also benefit from in MSVC codepath.

intern/atomic/intern/atomic_ops_msvc.h
232

Not sure why you refer to the implementation as a "workaround".

233–234

Those already exist at the top of the file.

235–236

If we go the explicit size variants (aka, _z suffix) it will be much better to use inlined function as that will bring compile-time type checks.

intern/atomic/intern/atomic_ops_unix.h
555

Language-dependent implementation will lead to a situation when the unit test covers one implementation, but the actual code in C will be using different (untested!) implementation.

567

C++11 is supposed to have <atomic>. If a compiler claims C++11 support and does not provide the header the compiler is broken, possibly in many other ways. Don't think we should be testing such things in the code.

569

It is not correct to cast scalar to std::atomic. On a platforms which do not have hardware support of atomic operations of the given type the std::atomic might have extra fields in it: for example, a spin lock.

571

Such return value is not guaranteed to be atomic. Not even sure why we should be attempting to have any return value.

intern/atomic/tests/atomic_test.cc
800–801

Such test will not catch issue when atomic_store_z is a no-op. Proper test is to have value initialized to one value (say, 0), and call atomic_store_z(&vlaue, 2).

Also, as mentioned above, I do not see how atomic_store could return meaningful value. So, call the function which is void, and do EXPECT_EQ(value, 2);

Sergey Sharybin (sergey) requested changes to this revision.May 25 2022, 11:01 AM
This revision now requires changes to proceed.May 25 2022, 11:01 AM

Alright I literally have no idea even how to start dealing with the problem... 64 bit vs 32 bit... and also c++ different codepath. Yeah I did use two code paths to get rid of weird gcc warnings.

I'm still confused about some of the problems:

  • The __sync_synchronize call also seem to be a built-in function, however it's not even available even when my GCC is c11 supported, I then don't have any idea what is guaranteed to work. Looks like cycles uses __atomic_load_n implementation as well, for some reason the initial attempts to use it didn't work when I was trying, but your code works. Not sure why.
  • The memory in Lineart is not 8 byte aligned (which I I also read if it is so the read/write of 64 bit is guaranteed to be atomic on x86/64 architectures), so at the moment there's still a need for this memory fencing. If we ensure that part is aligned then without memory fence can it be a bit faster as well?

Anyway, thanks for taking a look at this...

Updated to P2976 which sergey provided.

Tested on the buildbot, seems to work: https://builder.blender.org/admin/#/builders/18/builds/430

The patch description needs adjustment. Other than that, if it works for its original purpose then we're all good.

LGTM, the test coverage seems extensive enough to trust this has been implemented properly. The real test on how complete the code is will probably be when the debian guys update to a version that has these tests.

agree a better commit message is needed though

YimingWu (NicksBest) edited the summary of this revision. (Show Details)May 26 2022, 4:13 PM