amplab · sebhaub · Aug 16, 2016 · Aug 16, 2016
diff --git a/README.md b/README.md
@@ -6,8 +6,8 @@ to launch, manage and shut down
 on Amazon EC2. It automatically sets up Apache Spark and
 [HDFS](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html)
 on the cluster for you. This guide describes
-how to use `spark-ec2` to launch clusters, how to run jobs on them, and how 
-to shut them down. It assumes you've already signed up for an EC2 account 
+how to use `spark-ec2` to launch clusters, how to run jobs on them, and how
+to shut them down. It assumes you've already signed up for an EC2 account
 on the [Amazon Web Services site](http://aws.amazon.com/).
 
 `spark-ec2` is designed to manage multiple named clusters. You can
@@ -69,13 +69,15 @@ types, and the default type is `m3.large` (which has 2 cores and 7.5 GB
 RAM). Refer to the Amazon pages about [EC2 instance
 types](http://aws.amazon.com/ec2/instance-types) and [EC2
 pricing](http://aws.amazon.com/ec2/#pricing) for information about other
-instance types. 
+instance types.
 -    `--region=<ec2-region>` specifies an EC2 region in which to launch
 instances. The default region is `us-east-1`.
 -    `--zone=<ec2-zone>` can be used to specify an EC2 availability zone
 to launch instances in. Sometimes, you will get an error because there
 is not enough capacity in one zone, and you should try to launch in
 another.
+-    `--ebs-root-vol-type=<ebs-type>` can be used to specify the EBS
+     volume type to use. The default value is `gp2`.
 -    `--ebs-vol-size=<GB>` will attach an EBS volume with a given amount
      of space to each node so that you can have a persistent HDFS cluster
      on your nodes across cluster restarts (see below).
@@ -145,7 +147,7 @@ export AWS_ACCESS_KEY_ID=ABCDEFG1234567890123
 
 You can edit `/root/spark/conf/spark-env.sh` on each machine to set Spark configuration options, such
 as JVM options. This file needs to be copied to **every machine** to reflect the change. The easiest way to
-do this is to use a script we provide called `copy-dir`. First edit your `spark-env.sh` file on the master, 
+do this is to use a script we provide called `copy-dir`. First edit your `spark-env.sh` file on the master,
 then run `~/spark-ec2/copy-dir /root/spark/conf` to RSYNC it to all the workers.
 
 The [configuration guide](configuration.html) describes the available configuration options.
@@ -195,20 +197,20 @@ In addition to using a single input file, you can also use a directory of files
 This repository contains the set of scripts used to setup a Spark cluster on
 EC2. These scripts are intended to be used by the default Spark AMI and is *not*
 expected to work on other AMIs. If you wish to start a cluster using Spark,
-please refer to http://spark-project.org/docs/latest/ec2-scripts.html 
+please refer to http://spark-project.org/docs/latest/ec2-scripts.html
 
 ## spark-ec2 Internals
 
 The Spark cluster setup is guided by the values set in `ec2-variables.sh`.`setup.sh`
 first performs basic operations like enabling ssh across machines, mounting ephemeral
 drives and also creates files named `/root/spark-ec2/masters`, and `/root/spark-ec2/slaves`.
-Following that every module listed in `MODULES` is initialized. 
+Following that every module listed in `MODULES` is initialized.
 
 To add a new module, you will need to do the following:
 
 1. Create a directory with the module's name.
 
-2. Optionally add a file named `init.sh`. This is called before templates are configured 
+2. Optionally add a file named `init.sh`. This is called before templates are configured
 and can be used to install any pre-requisites.
 
 3. Add any files that need to be configured based on the cluster setup to `templates/`.

diff --git a/spark_ec2.py b/spark_ec2.py
@@ -249,12 +249,15 @@ def parse_args():
         "--resume", action="store_true", default=False,
         help="Resume installation on a previously launched cluster " +
              "(for debugging)")
+    parser.add_option(
+        "--ebs-root-vol-type", default="gp2",
+        help="Root EBS volume type (e.g. 'gp2', 'io1', 'st1', 'sc1', 'standard') (default: 'gp2')")
     parser.add_option(
         "--ebs-vol-size", metavar="SIZE", type="int", default=0,
         help="Size (in GB) of each EBS volume.")
     parser.add_option(
-        "--ebs-vol-type", default="standard",
-        help="EBS volume type (e.g. 'gp2', 'standard').")
+        "--ebs-vol-type", default="gp2",
+        help="EBS volume type (e.g. 'gp2', 'io1', 'st1', 'sc1', 'standard') (default: 'gp2')")
     parser.add_option(
         "--ebs-vol-num", type="int", default=1,
         help="Number of EBS volumes to attach to each node as /vol[x]. " +
@@ -588,9 +591,16 @@ def launch_cluster(conn, opts, cluster_name):
         print("Could not find AMI " + opts.ami, file=stderr)
         sys.exit(1)
 
-    # Create block device mapping so that we can add EBS volumes if asked to.
-    # The first drive is attached as /dev/sds, 2nd as /dev/sdt, ... /dev/sdz
+    # Create block device mapping so that we can configure and add EBS volumes if asked to.
     block_map = BlockDeviceMapping()
+    # add root ebs volume type
+    root_device = EBSBlockDeviceType()
+    root_device.volume_type = opts.ebs_root_vol_type
+    root_device.delete_on_termination = True
+    block_map['/dev/sda1'] = root_device
+
+    # add additional EBS volumes if asked to
+    # The first drive is attached as /dev/sds, 2nd as /dev/sdt, ... /dev/sdz
     if opts.ebs_vol_size > 0:
         for i in range(opts.ebs_vol_num):
             device = EBSBlockDeviceType()