-
Notifications
You must be signed in to change notification settings - Fork 203
Home
JoshRosen edited this page Nov 21, 2014
·
8 revisions
Welcome to the spark-perf
wiki. This page lists several useful scripts, helper functions, and analysis tools for running spark-perf
tests
In config.py
:
import os
SPARK_COMMIT_ID = os.environ["SPARK_COMMIT_ID"]
To run against multiple commits, use a shell script to repeatedly call bin/run
with different environment variables:
#!/usr/bin/env bash
# Note: the spaces in the parens are necessary:
versions=( "origin/tag/v1.1.0" "origin/tag/v1.1.1-rc2" "origin/tag/v1.2.0-snapshot1" "origin/branch-1.2" )
for version in ${versions[@]}
do
export SPARK_COMMIT_ID="$version"
./bin/run
done
To print the SHAs of every NR
th commit between two git tags (useful for bisecting):
git log --oneline origin/branch-1.2...v1.1.0 | awk 'NR == 1 || NR % 50 == 0' | cut -d ' ' -f1
Upgrade to a newer version of the aws
tool and configure AWS credentials:
sudo easy_install --upgrade awscli
aws configure
# AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
# AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Default region name [None]:
# Default output format [None]:
Sync the results folder to an S3 bucket:
aws s3 cp --recursive /local/path/to/results/directory s3://bucket-name/resultsdir/