To test that benchexec is installed correctly and run tools:
python3 -m benchexec.test_tool_info dummy --read-only-dir / --overlay-dir /home
To test the apalache tool info (from within the same directory as the
apalache.py
module):
python3 -m benchexec.test_tool_info .apalache --read-only-dir / --overlay-dir /home
To run the scratch configuration (from within the ./scratch
dir)
benchexec scratch/config.xml --read-only-dir / --overlay-dir /home
- Use task definition files: https://gitlab.com/sosy-lab/benchmarking/task-definition-format
Ends up in the results
dir in the working directory
- Get tool info working
- Make PR with tool info
- Generate task defs from sbt, with little EDSL as SBT plugin
- Review Igor's PR on benchmark organization
- Use SBT for running the benchmarks
- Make sbt config into a plugin
- Fetch apalache executable
- Add way to specify which apalache version
- Publish the BenchExec reports on GH pages
- Set up sbt-site
- Construct index for all benchmarks
- Transfer over the existing benchmarks
- Fix any missing results in output data
- Compare across different version wc
- Enable running individual suites
- Enable running individual benchmarks in a suite?
- Open source the benchexec sbt rules
- Set up incremental builds for benchmarks? (With option to override?)
- Document how to run and add benchmarks
- Support adding benchmarks for files in remote repos:
- Create git-submodule for each remote project?
- Use filter & sparse checkout to only pull the relevant dirs?
- See https://stackoverflow.com/a/52269934/1187277
If the target directory for the output files (specified with --outputpath) is a git repository without uncommitted changes and the option --commit is specified, benchexec will add and commit all created files to the git repository. One can use this to create a reliable archive of experimental results.
- Why picked benchexec
- considered:
- bench press (only smt, not as mature)
- python thing
- bespoke
- robust
- actively maintained
- widely known
- publication quality
- containment
- considered:
- Why picked sbt
- considered:
- Dhall -- another novelty in our stack. Not known by most
- XML -- can't generate experiments programmatically, verbose, not statically checked
- we need a build system
- we already use it and must know scala
- considered: