C++ Need for Memory Model


Example

int x, y;
bool ready = false;

void init()
{
  x = 2;
  y = 3;
  ready = true;
}
void use()
{
  if (ready)
    std::cout << x + y;
}

One thread calls the init() function while another thread (or signal handler) calls the use() function. One might expect that the use() function will either print 5 or do nothing. This may not always be the case for several reasons:

  • The CPU may reorder the writes that happen in init() so that the code that actually executes might look like:

    void init()
    {
      ready = true;
      x = 2;
      y = 3;
    }
    
  • The CPU may reorder the reads that happen in use() so that the actually executed code might become:

    void use()
    {
      int local_x = x;
      int local_y = y;
      if (ready)
        std::cout << local_x + local_y;
    }
    
  • An optimizing C++ compiler may decide to reorder the program in similar way.

Such reordering cannot change the behavior of a program running in single thread because a thread cannot interleave the calls to init() and use(). On the other hand in a multi-threaded setting one thread may see part of the writes performed by the other thread where it may happen that use() may see ready==true and garbage in x or y or both.

The C++ Memory Model allows the programmer to specify which reordering operations are permitted and which are not, so that a multi-threaded program would also be able to behave as expected. The example above can be rewritten in thread-safe way like this:

int x, y;
std::atomic<bool> ready{false};

void init()
{
  x = 2;
  y = 3;
  ready.store(true, std::memory_order_release);
}
void use()
{
  if (ready.load(std::memory_order_acquire))
    std::cout << x + y;
}

Here init() performs atomic store-release operation. This not only stores the value true into ready, but also tells the compiler that it cannot move this operation before write operations that are sequenced before it.

The use() function does an atomic load-acquire operation. It reads the current value of ready and also forbids the compiler from placing read operations that are sequenced after it to happen before the atomic load-acquire.

These atomic operations also cause the compiler to put whatever hardware instructions are needed to inform the CPU to refrain from the unwanted reorderings.

Because the atomic store-release is to the same memory location as the atomic load-acquire, the memory model stipulates that if the load-acquire operation sees the value written by the store-release operation, then all writes performed by init()'s thread prior to that store-release will be visible to loads that use()'s thread executes after its load-acquire. That is if use() sees ready==true, then it is guaranteed to see x==2 and y==3.

Note that the compiler and the CPU are still allowed to write to y before writing to x, and similarly the reads from these variables in use() can happen in any order.