benchrun Overview

Python {benchrun}

{benchrun} is a Python package to run macrobenchmarks, deliberately designed to work well with the larger conbench ecosystem.

Installation

{benchrun} is not [yet] on a package archive like PyPI; you can install from GitHub with

pip install benchrun@git+https://github.com/conbench/conbench.git@main#subdirectory=benchrun/python

Writing benchmarks

Iteration

The code to run for a benchmark is contained in a class inheriting from the abstract Iteration class. At a minimum, users must override the name attribute and run() method (the code to time), but may also override setup(), before_each(), after_each() and teardown() methods, where *_each() runs before/after each iteration, and setup() and teardown() run once before/after all iterations. A simple implementation might look like

import time

from benchrun import Iteration

class MyIteration(Iteration):
    name = "my-iteration"

    def before_each(self, case: dict) -> None:
        # use the `env` dict attribute to pass data between stages
        self.env = {"success": False}

    def run(self, case: dict) -> dict:
        # code to time goes here
        time.sleep(case["sleep_seconds"])
        self.env["success"] = True

    def after_each(self, case: dict) -> None:
        assert run_results["success"]
        self.env = {}

CaseList

An Iteration’s methods are parameterized with case, a dict where keys are parameters for the benchmark, and the values are scalar arguments. Cases are managed with an instance of CaseList, a class which takes a params dict, which is like a case dict with the difference that the arguments are lists of valid arguments, not scalars. CaseList will populate a case_list attribute which contains the grid of specified cases to be run:

from benchrun import CaseList

case_list = CaseList(params={"x": [1, 2], "y": ["a", "b", "c"]})
case_list.case_list
#> [{'x': 1, 'y': 'a'},
#>  {'x': 1, 'y': 'b'},
#>  {'x': 1, 'y': 'c'},
#>  {'x': 2, 'y': 'a'},
#>  {'x': 2, 'y': 'b'},
#>  {'x': 2, 'y': 'c'}]

CaseList contains an overridable filter_cases() method that can be used to remove invalid combinations of parameters, e.g. if an x of 2 with a y of b is not viable:

class MyCaseList(CaseList):
    def filter_cases(self, case_list: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        filtered_case_list = []
        for case in case_list:
            if not (case["x"] == 2 and case["y"] == "b"):
                filtered_case_list.append(case)

        return filtered_case_list

my_case_list = MyCaseList(params={"x": [1, 2], "y": ["a", "b", "c"]})
my_case_list.case_list
#> [{'x': 1, 'y': 'a'},
#>  {'x': 1, 'y': 'b'},
#>  {'x': 1, 'y': 'c'},
#>  {'x': 2, 'y': 'a'},
#>  {'x': 2, 'y': 'c'}]

If there are so many restrictions that it is simpler to specify which cases are viable than which are not, the case_list parameter of filter_cases() can be completely ignored and a manually-generated list can be returned.

Benchmark

A Benchmark in {benchrun} consists of an Iteration instance, a CaseList instance, and potentially a bit more metadata about how to run it like whether to drop disk caches beforehand.

my_benchmark = Benchmark(iteration=my_iteration, case_list=my_case_list)

This class has a run() method to run all cases, or run_case() to run a single case.

BenchmarkList

A BenchmarkList is a lightweight class to tie together all the instances of Benchmark that should be run together (e.g. all the benchmarks for a package).

from benchrun import BenchmarkList

my_benchmark_list = BenchmarkList(benchmarks = [my_benchmark])

The class has a __call__() method that will run all benchmarks in its list, taking care that they all use the same run_id so they will all appear together on conbench.

Running benchmarks and sending results to conbench

BenchmarkList is designed to work seamlessly with {benchadapt}’s CallableAdapter class:

from benchadapt.adapters import CallableAdapter

my_adapter = CallableAdapter(callable=my_benchmark_list)

Like all adapters, it then has a run() method to run all the benchmarks it contains (handling generic metadata appropriately for you), a post_results() method that will send the results to a conbench server, and a __call__() method that will do both. These are the methods that should be called in whatever CI or automated build system will be used for running benchmarks.

Setting more metadata

{benchrun} and {benchadapt} make an effort to handle as much metadata for you as possible (e.g. things like machine info), but you will still need to specify some metadata yourself, e.g. build flags used in compilation or things like run_reason (often something like commit or merge). To see what actually gets sent to conbench, see the documentation for benchadapt.BenchmarkResult.