Post
Memory ordering of atomics in Rust and Go

I’ve been reading Rust Atomics and Locks by Mara Bos. If you’re into low-level concurrency primitives, it’s a great book. The rest of you can leave now 🙃

Memory ordering is a place where Rust and Go diverge, and I think it’s illustrative of the difference in language philosophy. Rust provides a selection of memory orderings per atomic variable operation, whereas Go provides just one. Rust chooses raw performance whereas Go selects ease-of-use.

Let’s dig in.

To understand more, let’s first understand the options that Rust gives you. They are described at Ordering in std::sync::atomic. I’ll give my own interpretations below with the goal of writing them down now so I better remember them later.

Relaxed

Relaxed sounds almost unsafe — atomics feel like they should be strict! — but it’s often the right choice. It guarantees that all processors / threads see writes to that single variable in the same order. No guarantees are made about any other variables, leaving the compiler and processor to reorder instructions if they don’t have observable effects.

data.store(1, atomic::Ordering::Relaxed);

This makes Relaxed suitable for things like atomic counters used to issue sequential IDs within a process. A compiler cannot reorder operations that depend on the value of the atomic variable, meaning that our assignment of an ID works. You do need to be careful to use fetch_add or similar:

static COUNTER: AtomicU64 = AtomicU64::new(0);
pub fn get_id() -> u64 {
    COUNTER.fetch_add(1, atomic::Ordering::Relaxed)
}

Release / Acquire

I think of Release and Acquire as a pair. When used on a store and load respectively, they produce a “happens before” relationship between a Release store and its paired Acquire load. Unlike Relaxed, this guarantee extends to all memory operations preceding a store on the atomic in the code, not just the variable itself.

The guarantee means that the compiler and processor cannot reorder operations that come before the “store with Release” so they happen after the “load with Acquire”. Even if they appear to be unrelated.

This is useful for things like implementing mutexes. Let’s explore what happens if we try this with Relaxed:

// Presume this happened before thread 2 started
// so thread 2 starts with the lock "locked".
lock.store(true, atomic::Ordering::Relaxed);

// Thread 1
data = 7;
lock.store(false, atomic::Ordering::Relaxed);

// Thread 2
while lock.load(atomic::Ordering::Relaxed) {
  // spin lock
}
print!(data)

To a human, it’s clear that print is intended to output 7. The thing is, to the compiler and processor, there isn’t a relationship between lock and data. So either can reorder operations.

For example, the compiler could order the instructions in thread 1 to run data = 7 after the lock.store. This is allowed because data flow analysis within the compiler shows that data does not depend on the value of lock. When using Relaxed, the compiler and processor can reorder these instructions with the condition that within a single thread the reordering is indistinguishable from the original in the code.

The problem happens because we are depending on an ordering between threads 1 and 2. We are expecting the lock.load to force the print!(data) to happen after the data = 7 instruction, but that isn’t what Relaxed provides.

This is where Release and Acquire come in. They force an ordering between two accesses of the same atomic, meaning that the compiler and processor are no longer allowed to reorder calls as flexibly:

// Presume this happened before thread 2 started
// so thread 2 starts with the lock "locked".
// This _could_ use Relaxed in this case, as thread
// 2 is defined to start after this call, but usually
// that's not the case, so be safer and use Release.
lock.store(true, atomic::Ordering::Release);

// Thread 1
data = 7;
lock.store(false, atomic::Ordering::Release);

// Thread 2
while lock.load(atomic::Ordering::Acquire) {
  // spin lock
}
print!(data)

Here, everything that appears in the code before the lock.store(...Release) must happen before everything that appears after the lock.load(..Acquire) if thread 2 sees false for lock.load().

Because of the guarantee that the compiler and processor can no longer reorder the operations, data = 7 must now happen before the print!(data). Therefore we always see 7, as we wanted.

To see how to expand this into a mutex, see Rust Atomics and Locks — Chapter 4. Building Our Own Spin Lock. That chapter is really cool: it shows how Rust’s “guard” approach to mutexes can be implemented.

SeqCst (sequentially consistent)

Relaxed, Release and Acquire have memory ordering effects relating to a single atomic variable.

SeqCst changes this. It enforces a total ordering across all atomic operations with the SeqCst memory ordering. If every atomic access used SeqCst, this would be like a global memory barrier within a given process.

On the one hand, this makes things easy to reason about. To an extent, it enforces a notion of “time” — happens-before — across a whole program. On the other hand, it means that every SeqCst atomic operation requires processors to synchronise with each other no matter whether the atomic operations actually need to synchronise with each other. And CPUs having to synchronise with each other is a performance bottleneck.

Having internalised Relaxed, Release and Acquire, it becomes a little hard to see where I’d want to use SeqCst.

And indeed, the example that’s usually given for SeqCst is where two atomic variables are mutually dependent on each other. Quite unusual in day-to-day code. But it is needed: without SeqCst the compiler and processor can still reorder loads, stores and so on between the two variables. However, the Rust Atomics and Locks book suggests that fences can often be used instead.

So what does Go choose?

The thing with Go is that it likes to provide one way to do things. Which is mostly a good thing. Languages with many ways to do the same thing can be hard to read.

In this case, Go has chosen to only offer one of the memory ordering options. If you choose to offer only one, you have to choose SeqCst given it’s the super set of all the others:

In the terminology of the Go memory model, if the effect of an atomic operation A is observed by atomic operation B, then A “synchronizes before” B. Additionally, all the atomic operations executed in a program behave as though executed in some sequentially consistent order. This definition provides the same semantics as C++’s sequentially consistent atomics and Java’s volatile variables.

And so there you go, that’s why I think this is emblematic of Rust and Go’s design choices: Rust gives you the full menu and leaves you to figure out the best for your situation, whereas Go makes the choice for you but leaves some performance optimisations on the table.

Like many places where Rust and Go differ, most of the time the performance differences of this choice causes don’t matter. It’s a deep optimisation where you have to know what you are doing. Go remains a good and performant language. Rust is the right choice when control matters.

← Older
The A.I. Disruption We’ve Been Waiting for Has Arrived