PER: Performance

??? should this section be in the main guide???

This section contains rules for people who need high performance or low-latency. That is, these are rules that relate to how to use as little time and as few resources as possible to achieve a task in a predictably short time. The rules in this section are more restrictive and intrusive than what is needed for many (most) applications. Do not blindly try to follow them in general code: achieving the goals of low latency requires extra work.

Performance rule summary:

PER.1: Don't optimize without reason

Reason

If there is no need for optimization, the main result of the effort will be more errors and higher maintenance costs.

Note

Some people optimize out of habit or because it's fun.

???

PER.2: Don't optimize prematurely

Reason

Elaborately optimized code is usually larger and harder to change than unoptimized code.

???

PER.3: Don't optimize something that's not performance critical

Reason

Optimizing a non-performance-critical part of a program has no effect on system performance.

Note

If your program spends most of its time waiting for the web or for a human, optimization of in-memory computation is probably useless.

Put another way: If your program spends 4% of its processing time doing computation A and 40% of its time doing computation B, a 50% improvement on A is only as impactful as a 5% improvement on B. (If you don't even know how much time is spent on A or B, see PER.1 and PER.2.)

PER.4: Don't assume that complicated code is necessarily faster than simple code

Reason

Simple code can be very fast. Optimizers sometimes do marvels with simple code

Example, good
// clear expression of intent, fast execution

vector<uint8_t> v(100000);

for(auto& c : v)
    c = ~c;
Example, bad
// intended to be faster, but is actually slower

vector<uint8_t> v(100000);

for(size_t i=0; i<v.size(); i+=sizeof(uint64_t))
{
    uint64_t& quad_word = *reinterpret_cast<uint64_t*>(&v[i]);
    quad_word = ~quad_word;
}
Note

???

???

PER.5: Don't assume that low-level code is necessarily faster than high-level code

Reason

Low-level code sometimes inhibits optimizations. Optimizers sometimes do marvels with high-level code.

Note

???

???

PER.6: Don't make claims about performance without measurements

Reason

The field of performance is littered with myth and bogus folklore. Modern hardware and optimizers defy naive assumptions; even experts are regularly surprised.

Note

Getting good performance measurements can be hard and require specialized tools.

Note

A few simple microbenchmarks using Unix time or the standard library <chrono> can help dispel the most obvious myths. If you can't measure your complete system accurately, at least try to measure a few of your key operations and algorithms. A profiler can help tell you which parts of your system are performance critical. Often, you will be surprised.

???

PER.10: Rely on the static type system

Reason

Type violations, weak types (e.g. void*s), and low level code (e.g., manipulation of sequences as individual bytes) make the job of the optimizer much harder. Simple code often optimizes better than hand-crafted complex code.

???

PER.11: Move computation from run time to compile time

???

PER.12: Eliminate redundant aliases

???

PER.13: Eliminate redundant indirections

???

PER.14: Minimize the number of allocations and deallocations

???

PER.15: Do not allocate on a critical branch

???

PER.16: Use compact data structures

Reason

Performance is typically dominated by memory access times.

???

PER.17: Declare the most used member of a time-critical struct first

???

PER.18: Space is time

Reason

Performance is typically dominated by memory access times.

???

PER.19: Access memory predictably

Reason

Performance is very sensitive to cache performance and cache algorithms favor simple (usually linear) access to adjacent data.

Example
int matrix[rows][cols];

// bad
for (int c = 0; c < cols; ++c)
    for (int r = 0; r < rows; ++r)
        sum += matrix[r][c];

// good
for (int r = 0; r < rows; ++r)
    for (int c = 0; c < cols; ++c)
        sum += matrix[r][c];

PER.30: Avoid context switches on the critical path

???