When multiple atomic operations happen in the same cache line by different threads, the
second thread will block until the cache cacheline is unlocked by the first. When multiple
counter share the same cache line this would add unexpected locks.
This patch changes that by moving each atomic counter in its own cache
line.
TODO
See if we can verify any difference.
Verify with cpu engineers.