benchrun
Overview¶
Python {benchrun}¶
{benchrun} is a Python package to run macrobenchmarks, deliberately designed to work well with the larger conbench ecosystem.
Installation¶
{benchrun} is not [yet] on a package archive like PyPI; you can install from GitHub with
pip install benchrun@git+https://github.com/conbench/conbench.git@main#subdirectory=benchrun/python
Writing benchmarks¶
Iteration
¶
The code to run for a benchmark is contained in a class inheriting from the abstract
Iteration
class. At a minimum, users must override the name
attribute and run()
method (the code to time), but may also override setup()
, before_each()
,
after_each()
and teardown()
methods, where *_each()
runs before/after each
iteration, and setup()
and teardown()
run once before/after all iterations. A
simple implementation might look like
import time
from benchrun import Iteration
class MyIteration(Iteration):
name = "my-iteration"
def before_each(self, case: dict) -> None:
# use the `env` dict attribute to pass data between stages
self.env = {"success": False}
def run(self, case: dict) -> dict:
# code to time goes here
time.sleep(case["sleep_seconds"])
self.env["success"] = True
def after_each(self, case: dict) -> None:
assert run_results["success"]
self.env = {}
CaseList
¶
An Iteration
’s methods are parameterized with case
, a dict where keys are
parameters for the benchmark, and the values are scalar arguments. Cases are managed
with an instance of CaseList
, a class which takes a params
dict, which is like a
case dict with the difference that the arguments are lists of valid arguments, not
scalars. CaseList
will populate a case_list
attribute which contains the grid of
specified cases to be run:
from benchrun import CaseList
case_list = CaseList(params={"x": [1, 2], "y": ["a", "b", "c"]})
case_list.case_list
#> [{'x': 1, 'y': 'a'},
#> {'x': 1, 'y': 'b'},
#> {'x': 1, 'y': 'c'},
#> {'x': 2, 'y': 'a'},
#> {'x': 2, 'y': 'b'},
#> {'x': 2, 'y': 'c'}]
CaseList
contains an overridable filter_cases()
method that can be used to remove
invalid combinations of parameters, e.g. if an x
of 2
with a y
of b
is not
viable:
class MyCaseList(CaseList):
def filter_cases(self, case_list: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
filtered_case_list = []
for case in case_list:
if not (case["x"] == 2 and case["y"] == "b"):
filtered_case_list.append(case)
return filtered_case_list
my_case_list = MyCaseList(params={"x": [1, 2], "y": ["a", "b", "c"]})
my_case_list.case_list
#> [{'x': 1, 'y': 'a'},
#> {'x': 1, 'y': 'b'},
#> {'x': 1, 'y': 'c'},
#> {'x': 2, 'y': 'a'},
#> {'x': 2, 'y': 'c'}]
If there are so many restrictions that it is simpler to specify which cases are
viable than which are not, the case_list
parameter of filter_cases()
can be
completely ignored and a manually-generated list can be returned.
Benchmark
¶
A Benchmark
in {benchrun} consists of an Iteration
instance, a CaseList
instance, and potentially a bit more metadata about how to run it like whether to
drop disk caches beforehand.
my_benchmark = Benchmark(iteration=my_iteration, case_list=my_case_list)
This class has a run()
method to run all cases, or run_case()
to run a single
case.
BenchmarkList
¶
A BenchmarkList
is a lightweight class to tie together all the instances of
Benchmark
that should be run together (e.g. all the benchmarks for a package).
from benchrun import BenchmarkList
my_benchmark_list = BenchmarkList(benchmarks = [my_benchmark])
The class has a __call__()
method that will run all benchmarks in its list,
taking care that they all use the same run_id
so they will all appear together
on conbench.
Running benchmarks and sending results to conbench¶
BenchmarkList
is designed to work seamlessly with
{benchadapt}’s
CallableAdapter
class:
from benchadapt.adapters import CallableAdapter
my_adapter = CallableAdapter(callable=my_benchmark_list)
Like all adapters, it then has a run()
method to run all the benchmarks it
contains (handling generic metadata appropriately for you), a post_results()
method that will send the results to a conbench server, and a __call__()
method
that will do both. These are the methods that should be called in whatever CI or
automated build system will be used for running benchmarks.
Setting more metadata¶
{benchrun} and {benchadapt} make an effort to handle as much metadata for you as
possible (e.g. things like machine info), but you will still need to specify some
metadata yourself, e.g. build flags used in compilation or things like run_reason
(often something like commit
or merge
). To see what actually gets sent to
conbench, see the documentation for benchadapt.BenchmarkResult
.