This repository contains the evaluation materials for the PLDI 2020 paper "Predictable Accelerator Design with Time-Sensitive Affine Types" using the Dahlia programming language.
If you use our data or the Dahlia language, please cite us:
@inproceedings{10.1145/3385412.3385974,
author = {Nigam, Rachit and Atapattu, Sachille and Thomas, Samuel and Li, Zhijing and Bauer, Theodore and Ye, Yuwei and Koti, Apurva and Sampson, Adrian and Zhang, Zhiru},
title = {Predictable Accelerator Design with Time-Sensitive Affine Types},
year = {2020},
isbn = {9781450376136},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3385412.3385974},
doi = {10.1145/3385412.3385974},
booktitle = {Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation},
pages = {393–407},
numpages = {15},
keywords = {Affine Type Systems, High-Level Synthesis},
location = {London, UK},
series = {PLDI 2020}
}
There are three components to the evaluation:
- Benchmarks (this repository).
- The Dahlia Compiler: A compiler from Dahlia to Vivado HLS C.
- Polyphemus Server: A client–server system for orchestrating large-scale FPGA experiments.
If you're using the virtual machine image (see below), you just need the hypervisor. Otherwise, to set up the evaluation outside of the VM, start by cloning this repository. You will need these prerequisites:
- Get Python 3 if you don't already have it
- Install GNU
parallel
- Install Jupyter with
pip3 install jupyter
- Install other Python dependencies with
pip3 install -r requirements.txt
(in this repository) - Install the local benchmarking helpers with
cd benchmarking-helpers && pip3 install -e .
- Run the sanity checking script
./_scripts/sanity-check.sh
to make sure the all the tools are configured correctly.
- Download the VM Appliance. The username and password are
dahlia
. - (Optional, but recommended) Enable multiple physical cores for the Virtual Machine. In Virtual box, select the appliance and under Settings > System > Processor enable all physical cores.
- Boot the image in your favorite hypervisor (we tested the image using VirtualBox).
- Open a terminal and type
cd Desktop/dahlia-evaluation
. - Get the latest version of this repository:
git pull
. - Run
./_scripts/sanity-check.sh
. The script should report no errors. - Run
ESTIMATE=100 ./_scripts/run-dahlia-accepts.sh
. The script runs the dahlia compiler on 100 configurations for each benchmark and reports a time estimate for running on all configurations. - Run
jupyter nbconvert --execute main-figures.ipynb
and then typels all-figures/ | wc -l
. The reported number should be 13. - Open http://cerberus.cs.cornell.edu:5000. The web page should display the Polyphemus deployment available to PLDI AEC reviewers.
For artifact evaluation, we would like reviewers to go through the following steps (each of which is described in detail in a section below):
- Configurations accepted by Dahlia: Measure how many points in a large design space are well-typed according to Dahlia's type system.
- Exhaustive DSE: % of configurations accepted by Dahlia.
- Qualitative Studies: % of configurations accepted by Dahlia.
- Experimental data and graph generation: See how the raw data for our experiments turns into the charts you see in the paper.
- Regenerate all graphs in the paper using the
main-figures.ipynb
script. - (Optional) Open the Jupyter notebook and read the explanation for all the experiments.
- (Optional) Visually inspect the collected data in the repository.
- Regenerate all graphs in the paper using the
- Data collection example: Actually run the HLS experiments in the paper to collect the aforementioned raw data.
- Try out the scaled down data collection example with Polyphemus, our server for running FPGA compilation jobs.
- (Optional) Read the documentation on setting up a new experiment with Polyphemus ("Designing New Experiments" in this file).
- (Optional) Using the Dahlia compiler: Compile our example programs and write your own programs, observing the output HLS C code and the error messages.
- (Optional) Rebuild the compiler.
- (Optional) Run the examples and check the error messages generated by the compiler.
- (Optional) Check out the documentation on the language.
We recommend reviewers use as many physical cores as they have available to speed up this section. The script uses GNU
parallel
to speed up execution. Actual runtime will depend on the number of cores available.
In this section, we will reproduce the following claims:
Section 5.2
Dahlia accepts 354 configurations, or about 1.1% of the unrestricted design space.
Section 5.3 (stencil2d)
The resulting design space has 2,916 points. Dahlia accepts 18 of these points (0.6%).
Section 5.3 (md-knn)
The full space has 16,384 points, of which Dahlia accepts 525 (3%).
Section 5.3 (md-grid)
The full space has 21,952 points, of which Dahlia accepts 81 (0.4%)
Each claim has two parts: (1) the number of configurations in the design space, and (2) the number of configurations accepted by Dahlia (i.e., they are well-typed according to Dahlia's type checker).
Run the following command:
./_scripts/run-dahlia-accepts.sh
For each benchmark, our script generates k directories where k is the number
of configurations and runs the Dahlia compiler on each configuration.
The script will report number of configurations accepted for each benchmark.
The script generates files with the names *-accepted
in the repository root
which contain paths to configurations that are accepted by Dahlia.
Do not delete the files since they are used for the data collection experiment.
In this section, we reproduce all the graphs in the paper from data already committed to the repository. Because actually running the experiments and collecting the data requires access to proprietary compilers and/or hardware, we address data collection in the next section.
- In the
dahlia-evaluation/
directory, runjupyter notebook
. Your browser should open. - Click on
main-figures.ipynb
. - Click on the "Restart the kernel and re-run the whole notebook" button (⏩️).
- All the graphs will be generated within the notebook under the corresponding section.
Note: The color and the background on the graphs might look different but the points and the labels are correct.
Information on saved data: [click to expand] We optionally invite the reviewers to look at our collected data. This section describe where all the saved data is.
Sensitivity analysis (sensitivity-analysis/
)
The sensitivity analysis consists of three experiments:
- Fig. 4a: Unrolling the innermost loop without any partitioning (
sensitivity-analysis/no-partition-unoll/summary.csv
). - Fig. 4b: Unrolling with a constant partitioning (
sensitivity-analysis/const-partition-unroll/summary.csv
) - Fig. 4c: Unrolling and partitioning in lockstep (
sensitivity-analysis/lockstep-partition-and-unroll/summary.csv
).
Exhaustive DSE (exhaustive-dse/data/
)
The exhaustive design space exploration study uses a single experiment with 32,000 distinct configurations to generate the three subgraphs in Figure 7.
Qualitative study (qualitative-study/data/
)
The qualitative study consists of three benchmarks:
- stencil2d (
qualitative-study/stencil2d
). - md-knn (
qualitative-study/md-knn
). - md-grid (
qualitative-study/md-grid
).
Spatial (spatial-sweep/data/
)
The Spatial study consists of one experiment with several configurations to generate Figure 9 (main paper) and Figure 2 (supplementary text).
This section describes how to actually run the experiments to generate the raw data that goes into the plots demonstrated above. This step is the trickiest because it requires access to proprietary Xilinx toolchains and, in some cases, actual FPGA hardware. We have attempted to make this as painless as possible by using AWS EC2, provides a license-free way to use the Xilinx toolchain and "F1" instances that come equipped with Xilinx FPGAs, and Polyphemus, a server we developed to manage large numbers of FPGA compilation and execution jobs.
Each figure in the paper requires data from different sources:
- Exhaustive DSE (fig. 7) & Qualitative Studies (fig. 8): Requires Vivado HLS estimation tools.
- Sensitivity analysis (fig. 4): Requires full hardware synthesis toolchain and an FPGA to run the designs.
- Spatial Comparison (fig. 8): Requires functional Spatial toolchain that can taget ZedBoard.
Instructions for Artifact Evaluation: These directions will not reproduce the full set of data reported in the paper, which is generally not practical within the evaluation time (fig. 7, for example, took us 2,666 CPU-hours to produce). We instead provide smaller versions of each experiment that are practical to run within a reasonable amount of time. The idea is to to demonstrate that our distributed FPGA experimentation framework is functional to give evidence that our reported data is correct. We also provide instructions to reproduce our original results.
The experiments require access to a deployment of our AWS-based experimentation server. For the PLDI AEC, we've asked the PC chairs for permission to provide the reviewers with access to our deployment. Since it is expensive to keep the servers up, we ask the reviewers to co-ordinate with us to setup two day windows to evaluate our data collection scripts.
For ease of evaluation, we've automated the experiments to generate the data
for the qualitative studies. The Makefile
at the root of the repository
provides rules to automatically submit the jobs, monitor them, and download
results to generate graphs.
All three qualitative studies are available to be run. We recommend that reviewers
start with the md-grid
study first since it has 81
configurations and takes
~2 hours to run on the cluster.
- Make sure
machsuite-md-grid-accepted
is present in the repository root. This file is generated in the "Configurations accepted by Dahlia" step. - Run the following command.
make start-job BENCH=qualitative-study/machsuite-md-grid
- The command will generate all the configurations and upload them to cerberus.cs.cornell.edu:5000.
- The script will also start the following command to monitor the status of the jobs:
watch -n5 ./_scripts/status.py machsuite-md-grid-data/
- After uploading, most jobs should be in the
make
stage and some of them in themakeing
stage. If there are no jobs in themakeing
phase, please message us. - Wait for all jobs to enter the
done
phase. Once this happens, exit the watch script. If a job is infailed
state, see the instructions below. - The following command to generate the resource summary file
machsuite-md-grid-data/summary.csv
.make summarize-data BENCH=qualitative-study/machsuite-md-grid
- Run the following command to generate the graph PDF
data-collect-machsuite-md-grid-middle-unroll.pdf
Compare this PDF to the one generated under./qualitative-study/server-scripts/plot.py machsuite-md-grid
all-figures/
for the same benchmark.
To run other benchmarks, replace qualitative-study/machsuite-md-grid
with
qualitative-study/machsuite-md-knn
(525 configurations ~10 hours) or
qualitative-study/machsuite-stencil-stencil2d-inner
(18 configurations ~ 20
mins).
Note on intermittent failures: During the monitoring phase, some jobs might be reported as failing. The most likely cause of this is a data race within various nodes in the cluster--several execution nodes attempted to execute the same configuration and ended up in an erroneous state.
To re-run a failed job:
- Copy the reported job and open the Polyphemus deployment.
- Ctrl-F search the job ID and click on the link.
- On the job page, click on the "state" dropdown, select "Start Make", and click on "set".
- The job will then be restarted.
- Please message us if any of the jobs are reported as
failed
if this doesn't solve the problem.
Note on hanged jobs: The backend compiler (Vivado HLS) might consume a lot
of memory based on the jobs and cause the underlying process to not terminate.
Unfortunately, there is no way to distinguish such runaway processes from
long running estimation jobs. If a job is stuck in the makeing
stage for
more than two hours, please message us.
We provide two ways of interacting the evaluating the Dahlia compiler.
- Follow the examples on the Dahlia demo webpage. The compiler is packaged and served using Scala.js and does not require connection to a server.
- Follow the instructions and rebuild the Dahlia compiler from source. The compiler supports a software backend and has extensive testing to ensure correctness of the various program analyses.
- We additionally provide language documentation for the various parts of the compiler.
Polyphemus experiments go through the following flow:
Setup a Polyphemus deployment with multiple estimation machines and at least one FPGA machine. Note that the Polyphemus deployment for PLDI AEC reviewers does not support FPGA machines.
There are three experiment folders under sensitivity-analysis
. For each of the folder,
run the following commands. Set the BUILDBOT
environment variable to your
Polyphemus deployment.
- Set variable for specific experiment (we show one example):
export EXPERIMENT=sensitivity-analysis/const-partition-unroll cd $EXPERIMENT
- Generate all configurations:
../../_scripts/gen-dse.py $EXPERIMENT/gemm
- Upload the configurations:
../../_scripts/batch.py -p $(basename $EXPERIMENT) -m hw $EXPERIMENT/gemm-*
- Wait for all jobs to complete. Monitor them by running:
watch -n5 ../../_scripts/status.py ./
- Download, summarize the data, and generate the graphs
make graphs
We ran our evaluation on 20 AWS machines, each with 4 workers over the course of a week. This experiment requires babysitting the server fleet and manually restarting some jobs and machines.
Because of the amount of direct interaction required, we assume that the reader has read the documentation for Polyphemus and understands the basics of instance and jobs folder.
Due to the sheer size of the experimentation, we recommend monitoring the job status and extracting the data on one of the servers instead of locally downloading. We provide the scripts to monitor and collect data on the server.
- Set a unique prefix to track jobs associated with this experimentation run.
export PREFIX=exhaustive
- Generate all the configurations and upload them:
Depending on the number of threads for the upload server, this step can take up to two days. However, Polyphemus starts executing jobs as soon as they are uploaded. Keep this script running in a different shell and move onto the next step.
cd exhaustive-dse/ && ../_scripts/gen-dse.py gemm && ../_scripts/batch.py -p $PREFIX -m estimate gemm-*
- Log on to a Polyphemus server and enter the
instance
directory that contains thejobs/
folder. - Copy the scripts under
exhaustive-dse/scripts/
into this folder. - To monitor the jobs, first run
./get-prefix-jobs.sh $PREFIX
which generates a file named$PREFIX-jobs
. - Run
./status.sh $PREFIX-jobs
to get the state of all the jobs. - When all jobs are in a
done
state, run:This generates all the resource summaries undercat $PREFIX-jobs | parallel --progress --bar './extract.py jobs/{}'
raw/
- Finally, run the following to generate a summary CSV.
ls raw/*.json | parallel --progress --bar './to-csv.py raw/{}'
- The downloaded CSV can be analyzed using the
main.ipynb
script in the repository root.
During the time of submission of the paper, Spatial is still being actively developed and changed. To reproduce our experimental results, please follow the instructions in our fork of spatial-quickstart.
To design a new large scale experiment with Polyphemus, design and parameterize
it for use with gen-dse.py
. gen-dse.py
is a search and replace script that
generates folders for each possible configuration.
When invoked on a folder, it looks for a template.json
file that maps
paramters in files to possible values. For example, the following in
files in a folder named bench
:
bench.cpp | template.json |
---|---|
int x = ::CONST1::; int y = ::CONST2::; x + y; |
{ "bench.cpp": { "CONST1": [1, 2, 3], "CONST2": [1, 2, 3] } } |
gen_dse.py
will generate 9 configurations in total by iterating over the
possible values of CONST1
and CONST2
.
Follow the workflow from the "Sensitivity Analysis" study above to upload and run the jobs.
The infrastructure for running benchmarks is under the _scripts
directory.
For these scripts, you can set a BUILDBOT
environment variable to point to
the URL of the running Buildbot instance.
batch.py [click to expand]
Submit a batch of benchmark jobs to the Buildbot.Each argument to the script should be the path to a specific benchmark version in this repository, like baseline/machsuite-gemm-ncubed
.
Use it like this:
./_scripts/batch.py <benchpath1> <benchpath2> ...
The script creates a new directory for the batch under _results/
named with a timestamp.
It puts a list of job IDs in a file called jobs.txt
there.
It prints the name of the batch directory (i.e., the timestamp) to stdout.
This script has command-line options:
-E
: Submit jobs for full synthesis. (The default is to just do estimation.)-p
: Pretend to submit jobs, but don't actually submit anything. (For debugging.)
extract.py [click to expand]
Download results for a previously-submitted batch of benchmark jobs.On the command line, give the path to the batch directory. Like this:
./_scripts/extract.py _results/2019-07-13-17-13-09
The script downloads information about jobs from jobs.txt
in that directory.
It saves lots of extracted result values for the batch in a file called results.json
there.
summarize.py [click to expand]
Given some extracted data for a batch, summarize the results in a human-friendly CSV.Give the script the path to a results.json
, like this:
./_scripts/summarize.py _results/2019-07-13-17-13-09/results.json
The script produces a file in the same directory called summary.csv
with particularly relevant information pulled out.
status.py [click to expand]
Get the current status of a batch while you impatiently wait for jobs to complete. Print out the number of jobs in each state.Give the script the path to a batch directory:
./_scripts/status.py _results/2019-07-13-17-13-09
Use the watch command to repeatedly run the command every 5 seconds
watch -n5 ./_scripts/status.py _results/2019-07-13-17-13-09
Please open an issue or email Rachit Nigam.