c++ - How do I synchronize a store before a load in multiple threads?

Question

Welcome To Ask or Share your Answers For Others

c++ - How do I synchronize a store before a load in multiple threads?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - How do I synchronize a store before a load in multiple threads?

Consider the following program:

#include <thread>
#include <atomic>
#include <cassert>

int x = 0;
std::atomic<int> y = {0};
std::atomic<bool> x_was_zero = {false};
std::atomic<bool> y_was_zero = {false};

void write_x_load_y()
{
    x = 1;
    if (y == 0)
        y_was_zero = true;
}

void write_y_load_x()
{
    y = 1;
    if (x == 0)
        x_was_zero = true;
}

int main()
{
    std::thread a(write_x_load_y);
    std::thread b(write_y_load_x);
    a.join();
    b.join();
    assert(!x_was_zero || !y_was_zero);
}

Given the constraints that everything can be atomic except access to x, how can I guarantee that the assert passes?
If that's not possible as-is, is it possible if access to x can be atomic but no stronger than "relaxed"?
What is the least amount of synchronization (e.g. weakest memory models for all operations) necessary to guarantee this?

It's my understanding that without any form of fences or atomic access, it's possible (if only theoretically so) for the store x = 1 to sink below the load y == 0 (having been moved by the CPU if not per se by the compiler), causing a potential race where both x and y are 0 (and triggering that assertion).

I was initially under the na?ve impression that SEQ_CST guarantees total ordering of non-atomic variables. That is, a non-atomic (or relaxed) store of x ordered before a SEQ_CST load of y is guaranteed to actually happen first; similarly a SEQ_CST store of y ordered before a non-atomic (or relaxed) load of x is guaranteed to actually happen first; put together that would prevent the race. However, on further reading of https://en.cppreference.com/w/cpp/atomic/memory_order, I don't think the documentation actually says this, but rather that such ordering is only guaranteed for the opposite case (loads before stores), or cases where access to both x and y are SEQ_CST.

Similarly, I na?vely had thought that a memory barrier would force all loads OR stores before the barrier to happen before all loads OR stores after it, but reading https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence seems to imply that it's again only true for forcing ordering of loads before the barrier with stores after it. That doesn't help here either, I think, unless I'm supposed to put barriers in a less obvious place than "between the store and the load".

What synchronization method should I use here? Is it even possible?

question from:https://stackoverflow.com/questions/65648180/how-do-i-synchronize-a-store-before-a-load-in-multiple-threads

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:44:13+0000

This idea is fatally flawed, and impossible to make safe in ISO C++ with non-atomic x. Data-race Undefined Behaviour (UB) is unavoidable because one thread writes x unconditionally and the other reads it unconditionally.

At best you'd be rolling your own atomics by using compiler barriers to force one thread to sync actual memory state with abstract-machine state. But even then, rolling your own atomics without volatile is not very safe: https://lwn.net/Articles/793253/ explains why the Linux kernel's hand-rolled atomics use volatile casts for pure-store and pure-load. This gives you something like relaxed-atomic on normal compilers, but of course zero guarantee from ISO C++.

When to use volatile with multi threading? basically never- you can get the same efficient asm from using atomic<int> with mo_relaxed. (Or on x86, even acquire and release are free in asm.)

If you were going to attempt this, in practice on most implementations, std::atomic_thread_fence(std::memory_order_seq_cst) will block compile-time reordering of non-atomic operations across it. (e.g. in GCC I think it's basically equivalent to x86 asm("mfence" ::: "memory")¹ which blocks compile-time reordering and is also a full barrier. But I think some of that "strength" is an implementation-detail and not required by ISO C++.

Footnote 1: BTW, usually you want a dummy lock add with stack memory, not actual mfence, because mfence is slower.

Semi-related: Your bool variables don't need to be atomic. IDK if it's more or less distracting to make them atomic; I was leaning towards being simpler if they're not. They're each written by at most 1 thread, and only read after that thread has been joined. You could make them plain bool, and also write them unconditionally like y_was_zero = (y == 0); if you want. (But that's neutral as far as simplicity, although saves looking at their initializers).

What is the least amount of synchronization (e.g. weakest memory models for all operations) necessary to guarantee this?

x needs to be atomic<> and both stores need to be seq_cst. (This is basically equivalent to draining the store buffer after doing the store).

Like in https://preshing.com/20120515/memory-reordering-caught-in-the-act/

In practice I think both loads can be relaxed on most machines (maybe not POWER though where private store-forwarding is possible). For ISO C++ to guarantee it I think you need seq_cst on both loads as well, so all 4 operations are part of a global total order of operations across multiple objects that's compatible with program order. There's no synchronizes-with via release/acquire to create a happens-before relationship.

Generally seq_cst is the only ordering in the ISO C++ memory model that must translate to blocking StoreLoad reordering in a memory model based on the existence of an actual coherent state that exist even if nobody's looking at it, and individual threads accessing that state with local reordering. (ISO C++ only talks about what other threads can observe, and hypothetical observers in theory might not constrain code-gen. But in practice they do because compilers don't do whole-program inter-thread analysis.)

If you for some reason can't make `x` be `atomic<>`

Use C++20 atomic_ref<> to construct a reference to x that you can use to do xref.store(1, mo_seq_cst) or xref.load(mo_seq_cst).

Or with GNU C/C++ atomic builtins, __atomic_store_n(&x, 1, __ATOMIC_SEQ_CST) (which is exactly what C++20 atomic_ref is designed to wrap.)

Or with semi-portable stuff, *(volatile int*)&x = 1; and a barrier, which might or might not work, depending on the compiler. A DeathStation 9000 can certainly make volatile int assignment non-atomic if it wants to. But fortunately the compilers people choose to use in real life aim to not be terrible, and often to be usable for low-level systems programming. Still, this is not at all guaranteed by anything to work.

Categories

c++ - How do I synchronize a store before a load in multiple threads?

c++ - How do I synchronize a store before a load in multiple threads?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

If you for some reason can't make `x` be `atomic<>`

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

c++ - How do I synchronize a store before a load in multiple threads?

c++ - How do I synchronize a store before a load in multiple threads?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

If you for some reason can't make x be atomic<>

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

If you for some reason can't make `x` be `atomic<>`