PER: Performance
??? should this section be in the main guide???
This section contains rules for people who need high performance or low-latency. That is, these are rules that relate to how to use as little time and as few resources as possible to achieve a task in a predictably short time. The rules in this section are more restrictive and intrusive than what is needed for many (most) applications. Do not blindly try to follow them in general code: achieving the goals of low latency requires extra work.
Performance rule summary:
- PER.1: Don't optimize without reason
- PER.2: Don't optimize prematurely
- PER.3: Don't optimize something that's not performance critical
- PER.4: Don't assume that complicated code is necessarily faster than simple code
- PER.5: Don't assume that low-level code is necessarily faster than high-level code
- PER.6: Don't make claims about performance without measurements
- PER.10: Rely on the static type system
- PER.11: Move computation from run time to compile time
- PER.12: Eliminate redundant aliases
- PER.13: Eliminate redundant indirections
- PER.14: Minimize the number of allocations and deallocations
- PER.15: Do not allocate on a critical branch
- PER.16: Use compact data structures
- PER.17: Declare the most used member of a time-critical struct first
- PER.18: Space is time
- PER.19: Access memory predictably
- PER.30: Avoid context switches on the critical path
PER.1: Don't optimize without reason
Reason
If there is no need for optimization, the main result of the effort will be more errors and higher maintenance costs.
Note
Some people optimize out of habit or because it's fun.
???
PER.2: Don't optimize prematurely
Reason
Elaborately optimized code is usually larger and harder to change than unoptimized code.
???
PER.3: Don't optimize something that's not performance critical
Reason
Optimizing a non-performance-critical part of a program has no effect on system performance.
Note
If your program spends most of its time waiting for the web or for a human, optimization of in-memory computation is probably useless.
Put another way: If your program spends 4% of its processing time doing computation A and 40% of its time doing computation B, a 50% improvement on A is only as impactful as a 5% improvement on B. (If you don't even know how much time is spent on A or B, see PER.1 and PER.2.)
PER.4: Don't assume that complicated code is necessarily faster than simple code
Reason
Simple code can be very fast. Optimizers sometimes do marvels with simple code
Example, good
// clear expression of intent, fast execution
vector<uint8_t> v(100000);
for(auto& c : v)
c = ~c;
Example, bad
// intended to be faster, but is actually slower
vector<uint8_t> v(100000);
for(size_t i=0; i<v.size(); i+=sizeof(uint64_t))
{
uint64_t& quad_word = *reinterpret_cast<uint64_t*>(&v[i]);
quad_word = ~quad_word;
}
Note
???
???
PER.5: Don't assume that low-level code is necessarily faster than high-level code
Reason
Low-level code sometimes inhibits optimizations. Optimizers sometimes do marvels with high-level code.
Note
???
???
PER.6: Don't make claims about performance without measurements
Reason
The field of performance is littered with myth and bogus folklore. Modern hardware and optimizers defy naive assumptions; even experts are regularly surprised.
Note
Getting good performance measurements can be hard and require specialized tools.
Note
A few simple microbenchmarks using Unix time
or the standard library <chrono>
can help dispel the most obvious myths.
If you can't measure your complete system accurately, at least try to measure a few of your key operations and algorithms.
A profiler can help tell you which parts of your system are performance critical.
Often, you will be surprised.
???
PER.10: Rely on the static type system
Reason
Type violations, weak types (e.g. void*
s), and low level code (e.g., manipulation of sequences as individual bytes) make the job of the optimizer much harder. Simple code often optimizes better than hand-crafted complex code.
???
PER.11: Move computation from run time to compile time
???
PER.12: Eliminate redundant aliases
???
PER.13: Eliminate redundant indirections
???
PER.14: Minimize the number of allocations and deallocations
???
PER.15: Do not allocate on a critical branch
???
PER.16: Use compact data structures
Reason
Performance is typically dominated by memory access times.
???
PER.17: Declare the most used member of a time-critical struct first
???
PER.18: Space is time
Reason
Performance is typically dominated by memory access times.
???
PER.19: Access memory predictably
Reason
Performance is very sensitive to cache performance and cache algorithms favor simple (usually linear) access to adjacent data.
Example
int matrix[rows][cols];
// bad
for (int c = 0; c < cols; ++c)
for (int r = 0; r < rows; ++r)
sum += matrix[r][c];
// good
for (int r = 0; r < rows; ++r)
for (int c = 0; c < cols; ++c)
sum += matrix[r][c];
PER.30: Avoid context switches on the critical path
???