Skip to content

Latest commit

 

History

History
100 lines (85 loc) · 4.54 KB

README.md

File metadata and controls

100 lines (85 loc) · 4.54 KB

PIVOT Scheduling Simulator

The PIVOT scheduling simulator is developed for in-depth evaluation of PIVOT scheduling algorithms in the cross-cloud, geo-distributed cloud environment. It simulates the cross-cloud infrastructure on top of VMs and networks provisioned by AWS and GCP, and runs data-intensive applications in the form of containers atop the infrastructure. The applications are workflows composed of data processing tasks with data dependencies among each other. We use the batch jobs sampled from the Alibaba 2018 cluster data as the default workload. However, any other workload in the same format can be used by the simulator.

Quickstart

The simulator is already dockerized and can be run as a Docker container as below:

$ docker run -ti --rm \
-v <local-job-dir>:/jobs \
-v <local-output-dir>:/output \
dchampion24/pivot-scheduling:alibaba \
--num-hosts 100 overall --num-apps 100

The <local-job-dir> is the directory of the YAML-formatted job files. Sample job files collected from the Alibaba cluster trace are provided here. The <output-job-dir> is the directory for storing the experimental data and plots generated by the simulation. Both directories require the absolute paths to comply with docker run rules.

The simulator takes a number of parameters for tweaking the simulation as below:

$ docker run -ti dchampion24/pivot-scheduling:alibaba -h
usage: sim.py [-h] [--num-hosts N_HOSTS] [--cpus CPUS] [--mem MEM]
              [--disk DISK] [--gpus GPUS] [--job-dir JOB_DIR]
              [--output-dir OUTPUT_DIR]
              [--task-output-scale-factor OUTPUT_SCALE_FACTOR]
              {overall,num-apps} ...

Run simulation on Alibaba cluster trace

positional arguments:
  {overall,num-apps}    Experiment type
    overall             Run the overall experiment
    num-apps            Run the experiment with varying number of applications

optional arguments:
  -h, --help            show this help message and exit
  --num-hosts N_HOSTS   Number of hosts
  --cpus CPUS           Number of CPUs per host
  --mem MEM             RAM in MBs per host
  --disk DISK           Disk space in GBs per host
  --gpus GPUS           Number of GPU units per host
  --job-dir JOB_DIR     Batch job directory
  --output-dir OUTPUT_DIR
                        Output directory for results
  --task-output-scale-factor OUTPUT_SCALE_FACTOR
                        Scale factor of the output data size of tasks
                        (proportional to the memory demand)

The output data is available under <local-output-dir>/<n_app/overall>/<timestamp>. There are two sub-directories - the raw experimental data is stored in data/ and the plots are stored in plot/.

Sample job data from Alibaba cluster trace

$ python3 sample.py -h
usage: sample.py [-h] --num-jobs N_JOBS [--min-runtime MIN_RUNTIME]
                 [--max-runtime MAX_RUNTIME] --start START --interval INTERVAL
                 [--min-deps MIN_DEPS] [--max-parallel MAX_PARALLEL]
                 --output-dir OUTPUT_DIR

Script for sampling batch jobs from Alibaba cluster trace dataset

optional arguments:
  -h, --help            show this help message and exit
  --num-jobs N_JOBS, -n N_JOBS
                        Number of sampled jobs
  --min-runtime MIN_RUNTIME, -l MIN_RUNTIME
                        Minimum runtime
  --max-runtime MAX_RUNTIME, -u MAX_RUNTIME
                        Maximum runtime
  --start START, -s START
                        Start timestamp of the sampling
  --interval INTERVAL, -i INTERVAL
                        Interval of the sampling
  --min-deps MIN_DEPS, -d MIN_DEPS
                        Minimum number of tasks with dependencies in a job
  --max-parallel MAX_PARALLEL, -p MAX_PARALLEL
                        Maximum level of parallelism of tasks
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
                        Output directory of the sample data