Stability and reproducibility¶
Why stability matters¶
When doing performance analysis, a lot of different factors come into play. From the hardware to software, many things can impact the performance tests results, and make them less reproducible.
calcite tries its best to avoid false-positives when detecting regressions, but removing noise from the benchmarks you will run helps a lot. One can not remove all the noise, but must at least try to limit it and be aware of its major causes.
Having control over the various performance factors also helps to have a more correct understanding of what is happening. It is very often the case that developpers draw bad conclusions due to using a poor environment or metrics.
Run your tests more than once¶
Machine stability¶
Note
If you are using a linux machine, know that the pyperf system tune command can fix quite a few of these issues. For Windows machines, powercfg can be used to control power usage and CPU settings.
We might provide a similar tool that works on multiple operating systems at a later time.
Power supply¶
Make sure that your power supply is reliable and your power cord is plugged in! If your power supply is of poor quality, this can impact the frequencies your processor and other hardware.
There are safeties enabled when you are running your latpop using its battery that can greatly impact performance results.
Heat¶
Your motherboard and other hardware components (CPU, GPU, …) will throttle when facing high temperatures. Make sure that your machine has a good cooling system.
CPU¶
-
frequency
performance
and write 0
to /sys/devices/system/cpu/cpufreq/boost
(kernel doc) or use the Intel pstate driver.-
hyperthreading
Note
On some processors, hyperthreading also divides the number of PMU (Performance Monitoring Unit) counters available by 2. If you rely on such events (cache miss rate, branch mispredictions, etc…) you might want to disable hyperthreading, otherwise the OS will have to use multiplexing.
GPU¶
Similarly to the CPUs, GPUs now have frequency scaling.
On Windows, make sure that your test application or profiler makes use of the ID3D12Device::SetStablePowerState function.
Note
This requires your machine to have the developper mode enabled.
On linux, this usually depends on the device vendor and the installed drivers.
Operating system¶
-
background processes
To mitigate the issue, the most basic advice is to avoid running unnecessary processes in the background. Those will simply consume resources and cause contention. The operating system might need to suspend your application, even for tiny amounts of time.
-
scheduler
The operating system is responsible for scheduling processes and threads. By default it will try to balance CPU time fairly between processes, which can be an issue when doing performance testing.
One simple way to mitigate this problem is to give higher priority to your performance test processes.
On Windows this is done using the SetPriorityClass function on existing processes, or launching it with the start command and specifying the priority (/high
or /realtime
).
On linux this is controlled through the nice
command/function.
Note
If you decide to isolate CPU cores, this will have little to no impact.
Warning
If your test benchmark involves communication with other processes, this can negatively impact performance. It usually still gives better reproducibility of the results though.
-
cpu core isolation
On multi-core systems, you can also dedicate specific cores to your application by isolating them.
taskset
command in combination with the isolcpus kernel parameter.Warning
While it might not reflect the end-user case, it is generally a good idea when checking for regressions. It is particularly great for compute workloads.
-
IRQ affinity
/proc/irq/default_smp_affinity
and /proc/irq/IRQ#/smp_affinity
(doc)-
Address Space Layout Randomization
Note
References
- STI16
Stinner, V. (2016, May 23). My journey to stable benchmark, part 3. Last accessed from https://vstinner.github.io/journey-to-stable-benchmark-average.html