Benchmarks¶
This package provides a few benchmarks to know how well it performs. This is helpful to know if new features or refactorings degrade the performance.
Running them¶
After installing the project on your machine, you can run:
This will run all the benchmarks and will produce an output like this:
The interesting information is located in the table at the bottom.
This is the class name containing the benchmark being run. Here
MapperBench
corresponds to the \CuyZ\Valinor\QA\Benchmark\MapperBench
class located at qa/Benchmark/MapperBench.php
.
This is the method name inside the benchmark class. Here readme
corresponds to \CuyZ\Valinor\QA\Benchmark\MapperBench::readme()
This is the number of revolutions. It's the number of times the subject is called consecutively.
By default the project runs them a 1000 times to have a good average. Each subject executes in micro, or milli, seconds, meaning that it's easy to have a large variation between each run. The higher the number of revolutions the more stable is the benchmark.
Warmup
Before benching a subject it runs it once but that revolution is now included in the statistics.
This first run is here due to the impact of Composer class autoloading.
This run will be the first to encounter a class and require
it, this
means doing IO. But once it's loaded the subsequent revolutions won't do
the IO part.
This means that the time seen in the mode doesn't include the autoloading time. So it doesn't completely reflect the real execution time that can be observed in an end user project. But since this project can't affect the time it takes Composer to load classes it's ok to not include it in these benchmarks.
This is the number of iterations. It's the number of processes running a subject.
By default the project runs them 10 times. This means each iteration runs a 1000 times the subject, for a total of 10000 calls.
Just like revolutions, having multiple iterations allows to have a better mode.
The maximum memory it took to bench the subject, across all revolutions and iterations.
This is the average time it took to run the subject.
In essence it's the sum of the 10000 calls execution time divided by 10000.
This is the time deviation between iterations. A high percentage means each iteration execution time is very different from the other ones. This may indicate the code is non-deterministic, meaning each revolution runs a different code or is affected by IO operations.
As indicated in the documentation this value should be below 2%.
Comparing against a baseline¶
As said above, the goal of benchmarks is to know if some new code introduces a regression in performance.
You first need to create a baseline. This is running the benchmarks and storing the results. It's done via this command:
Then you can do the code modification you want in the project. And to know if you introduce a regression you run:
In case of a regression, it will produce an output like this:
In the columns mem_peak
, mode
and rstdev
you'll find a new percentage
indicating the difference between your code and the baseline.
If any subject's average execution time goes above the baseline one, within a margin of error of 10%, it will fail the command.
To simplify your life, this benchmark comparison is automatically run for you when opening a Pull Request on this project.
And the CI will fail in case there's a regression.
Note
Since the rstdev
value is meant to be as low as possible in the baseline
any slight variation will make the comparison diff percentage go high. So
any red value in this column doesn't necessarily mean there's something
wrong.
This is also why there's no assertion on this value.