Stability and reproducibility¶
Why stability matters¶
When doing performance analysis, a lot of different factors come into play. From the hardware to software, many things can impact the performance tests results, and make them less reproducible.
calcite tries its best to avoid false-positives when detecting regressions, but removing noise from the benchmarks you will run helps a lot. One can not remove all the noise, but must at least try to limit it and be aware of its major causes.
Having control over the various performance factors also helps to have a more correct understanding of what is happening. It is very often the case that developpers draw bad conclusions due to using a poor environment or metrics.
Run your tests more than once¶
We might provide a similar tool that works on multiple operating systems at a later time.
Make sure that your power supply is reliable and your power cord is plugged in! If your power supply is of poor quality, this can impact the frequencies your processor and other hardware.
There are safeties enabled when you are running your latpop using its battery that can greatly impact performance results.
Your motherboard and other hardware components (CPU, GPU, …) will throttle when facing high temperatures. Make sure that your machine has a good cooling system.
/sys/devices/system/cpu/cpufreq/boost(kernel doc) or use the Intel pstate driver.
On some processors, hyperthreading also divides the number of PMU (Performance Monitoring Unit) counters available by 2. If you rely on such events (cache miss rate, branch mispredictions, etc…) you might want to disable hyperthreading, otherwise the OS will have to use multiplexing.
Similarly to the CPUs, GPUs now have frequency scaling.
On Windows, make sure that your test application or profiler makes use of the ID3D12Device::SetStablePowerState function.
This requires your machine to have the developper mode enabled.
On linux, this usually depends on the device vendor and the installed drivers.
To mitigate the issue, the most basic advice is to avoid running unnecessary processes in the background. Those will simply consume resources and cause contention. The operating system might need to suspend your application, even for tiny amounts of time.
The operating system is responsible for scheduling processes and threads. By default it will try to balance CPU time fairly between processes, which can be an issue when doing performance testing.
One simple way to mitigate this problem is to give higher priority to your performance test processes.
On Windows this is done using the SetPriorityClass function on existing processes, or launching it with the start command and specifying the priority (
On linux this is controlled through the
If you decide to isolate CPU cores, this will have little to no impact.
If your test benchmark involves communication with other processes, this can negatively impact performance. It usually still gives better reproducibility of the results though.
cpu core isolation
On multi-core systems, you can also dedicate specific cores to your application by isolating them.
tasksetcommand in combination with the isolcpus kernel parameter.
While it might not reflect the end-user case, it is generally a good idea when checking for regressions. It is particularly great for compute workloads.
Address Space Layout Randomization
Stinner, V. (2016, May 23). My journey to stable benchmark, part 3. Last accessed from https://vstinner.github.io/journey-to-stable-benchmark-average.html