I’ve been reading Rust Atomics and Locks by Mara Bos. If you’re into low-level concurrency primitives, it’s a great book. The rest of you can leave now 🙃
Memory ordering is a place where Rust and Go diverge, and I think it’s illustrative of the difference in language philosophy. Rust provides a selection of memory orderings per atomic variable operation, whereas Go provides just one. Rust chooses raw performance whereas Go selects ease-of-use.
Let’s dig in.
To understand more, let’s first understand the options that Rust gives you. They are described at Ordering in std::sync::atomic. I’ll give my own interpretations below with the goal of writing them down now so I better remember them later.
Relaxed
Relaxed sounds almost unsafe — atomics feel like they should be strict! — but it’s often the right choice. It guarantees that all processors / threads see writes to that single variable in the same order. No guarantees are made about any other variables, leaving the compiler and processor to reorder instructions if they don’t have observable effects.
data.store(1, atomic::Ordering::Relaxed);
This makes Relaxed suitable for things like atomic counters used to issue
sequential IDs within a process. A compiler cannot reorder operations that
depend on the value of the atomic variable, meaning that our assignment of an ID
works. You do need to be careful to use fetch_add or similar:
static COUNTER: AtomicU64 = AtomicU64::new(0);
pub fn get_id() -> u64 {
COUNTER.fetch_add(1, atomic::Ordering::Relaxed)
}
Release / Acquire
I think of Release and Acquire as a pair. When used on a store and load
respectively, they produce a “happens before” relationship between a Release
store and its paired Acquire load. Unlike Relaxed, this guarantee extends to
all memory operations preceding a store on the atomic in the code, not just the
variable itself.
The guarantee means that the compiler and processor cannot reorder operations that come before the “store with Release” so they happen after the “load with Acquire”. Even if they appear to be unrelated.
This is useful for things like implementing mutexes. Let’s explore what happens
if we try this with Relaxed:
// Presume this happened before thread 2 started
// so thread 2 starts with the lock "locked".
lock.store(true, atomic::Ordering::Relaxed);
// Thread 1
data = 7;
lock.store(false, atomic::Ordering::Relaxed);
// Thread 2
while lock.load(atomic::Ordering::Relaxed) {
// spin lock
}
print!(data)
To a human, it’s clear that print is intended to output 7. The thing is, to
the compiler and processor, there isn’t a relationship between lock and
data. So either can reorder operations.
For example, the compiler could order the instructions in thread 1 to run
data = 7 after the lock.store. This is allowed because data flow analysis
within the compiler shows that data does not depend on the value of lock.
When using Relaxed, the compiler and processor can reorder these instructions
with the condition that within a single thread the reordering is
indistinguishable from the original in the code.
The problem happens because we are depending on an ordering between threads 1
and 2. We are expecting the lock.load to force the print!(data) to happen
after the data = 7 instruction, but that isn’t what Relaxed provides.
This is where Release and Acquire come in. They force an ordering between
two accesses of the same atomic, meaning that the compiler and processor are no
longer allowed to reorder calls as flexibly:
// Presume this happened before thread 2 started
// so thread 2 starts with the lock "locked".
// This _could_ use Relaxed in this case, as thread
// 2 is defined to start after this call, but usually
// that's not the case, so be safer and use Release.
lock.store(true, atomic::Ordering::Release);
// Thread 1
data = 7;
lock.store(false, atomic::Ordering::Release);
// Thread 2
while lock.load(atomic::Ordering::Acquire) {
// spin lock
}
print!(data)
Here, everything that appears in the code before the lock.store(...Release)
must happen before everything that appears after the lock.load(..Acquire) if
thread 2 sees false for lock.load().
Because of the guarantee that the compiler and processor can no longer reorder
the operations, data = 7 must now happen before the print!(data). Therefore
we always see 7, as we wanted.
To see how to expand this into a mutex, see Rust Atomics and Locks — Chapter 4. Building Our Own Spin Lock. That chapter is really cool: it shows how Rust’s “guard” approach to mutexes can be implemented.
SeqCst (sequentially consistent)
Relaxed, Release and Acquire have memory ordering effects relating to a
single atomic variable.
SeqCst changes this. It enforces a total ordering across all atomic
operations with the SeqCst memory ordering. If every atomic access used
SeqCst, this would be like a global
memory barrier within a given
process.
On the one hand, this makes things easy to reason about. To an extent, it
enforces a notion of “time” — happens-before — across a whole program. On
the other hand, it means that every SeqCst atomic operation requires
processors to synchronise with each other no matter whether the atomic
operations actually need to synchronise with each other. And CPUs having to
synchronise with each other is a performance bottleneck.
Having internalised Relaxed, Release and Acquire, it becomes a little hard
to see where I’d want to use SeqCst.
And indeed, the example that’s usually given for SeqCst is where two atomic
variables are mutually dependent on each other. Quite unusual in day-to-day
code. But it is needed: without SeqCst the compiler and processor can still
reorder loads, stores and so on between the two variables. However, the Rust
Atomics and Locks book suggests that
fences can often be used instead.
So what does Go choose?
The thing with Go is that it likes to provide one way to do things. Which is mostly a good thing. Languages with many ways to do the same thing can be hard to read.
In this case, Go has chosen to only offer one of the memory ordering options. If
you choose to offer only one, you have to choose SeqCst given it’s the
super set of all the others:
In the terminology of the Go memory model, if the effect of an atomic operation A is observed by atomic operation B, then A “synchronizes before” B. Additionally, all the atomic operations executed in a program behave as though executed in some sequentially consistent order. This definition provides the same semantics as C++’s sequentially consistent atomics and Java’s volatile variables.
And so there you go, that’s why I think this is emblematic of Rust and Go’s design choices: Rust gives you the full menu and leaves you to figure out the best for your situation, whereas Go makes the choice for you but leaves some performance optimisations on the table.
Like many places where Rust and Go differ, most of the time the performance differences of this choice causes don’t matter. It’s a deep optimisation where you have to know what you are doing. Go remains a good and performant language. Rust is the right choice when control matters.