CP: Concurrency and Parallelism

The core component of concurrent and parallel programming is the thread. Threads allow you to run multiple instances of your program independently, while sharing the same memory. Concurrent programming is tricky for many reasons, most importantly that it is undefined behavior to read data in one thread after it was written by another thread, if there is no proper synchronization between those threads. Making existing single-threaded code execute concurrently can be as trivial as adding std::async or std::thread strategically, or it can be necessitate a full rewrite, depending on whether the original code was written in a thread-friendly way.

The concurrency/parallelism rules in this document are designed with three goals in mind:

To help you write code that is amenable to being used in a threaded environment
To show clean, safe ways to use the threading primitives offered by the standard library
To offer guidance on what to do when concurrency and parallelism aren't giving you the performance gains you need

It is also important to note that concurrency in C++ is an unfinished story. C++11 introduced many core concurrency primitives, C++14 improved on them, and it seems that there is much interest in making the writing of concurrent programs in C++ even easier. We expect some of the library-related guidance here to change significantly over time.

Concurrency and parallelism rule summary:

CP.1: Assume that your code will run as part of a multi-threaded program
CP.2: Avoid data races

CP.1: Assume that your code will run as part of a multi-threaded program

Reason

It is hard to be certain that concurrency isn't used now or sometime in the future. Code gets re-used. Libraries using threads may be used from some other part of the program. Note that this applies most urgently to library code and least urgently to stand-alone applications.

Example

double cached_computation(int x)
{
    static int cached_x = 0;
    static double cached_result = COMPUTATION_OF_ZERO;
    if (cached_x == x)
        return cached_result;
    double result = computation(x);
    cached_x = x;
    cached_result = result;
    return result;
}

Although cached_computation works perfectly in a single-threaded environment, in a multi-threaded environment the two static variables result in data races and thus undefined behavior.

There are several ways that this example could be made safe for a multi-threaded environment:

Delegate concurrency concerns upwards to the caller.
Mark the static variables as thread_local (which might make caching less effective).
Implement concurrency control, for example, protecting the two static variables with a static lock (which might reduce performance).
Have the caller provide the memory to be used for the cache, thereby delegating both memory allocation and concurrency concerns upwards to the caller.
Refuse to build and/or run in a multi-threaded environment.
Provide two implementations, one which is used in single-threaded environments and another which is used in multi-threaded environments.

Exception: There are examples where code will never be run in a multi-threaded environment. However, there are also many examples where code that was "known" to never run in a multi-threaded program was run as part of a multi-threaded program. Often years later. Typically, such programs lead to a painful effort to remove data races. Therefore, code that is never intended to run in a multi-threaded environment should be clearly labeled as such and ideally come with compile or run-time enforcement mechanisms to catch those usage bugs early.

CP.2: Avoid data races

Reason

Unless you do, nothing is guaranteed to work and subtle errors will persist.

Note

In a nutshell, if two threads can access the same named object concurrently (without synchronization), and at least one is a writer (performing a non-const operation), you have a data race. For further information of how to use synchronization well to eliminate data races, please consult a good book about concurrency.

Example

There are many examples of data races that exist, some of which are running in production software at this very moment. One very simple example:

int get_id() {
  static int id = 1;
  return id++;
}

The increment here is an example of a data race. This can go wrong in many ways, including:

Thread A loads the value of id, the OS context switches A out for some period, during which other threads create hundreds of IDs. Thread A is then allowed to run again, and id is written back to that location as A's read of id plus one.
Thread A and B load id and increment it simultaneously. They both get the same ID.

Enforcement

Some is possible, do at least something.