Skip to content
This repository was archived by the owner on Nov 23, 2017. It is now read-only.

Temporary - changes for branch-2.2.0 #116

Open
wants to merge 38 commits into
base: branch-1.6
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
0020b7a
Add support for 2.0.0-preview
shivaram Jun 14, 2016
38b0095
Check if hadoop version is YARN for Spark 2.0
shivaram Jun 15, 2016
11a2975
Address code review comments
shivaram Jun 15, 2016
79736f2
Remove debug print statement
shivaram Jun 15, 2016
d89a22e
Merge pull request #35 from shivaram/2.0-preview
shivaram Jun 15, 2016
8aff6d1
Now that it's been released, enable launching with spark 2.0.0
tomerk Aug 29, 2016
59045a1
Updated default spark version and hadoop version to 2.0.0 and yarn
tomerk Aug 29, 2016
472d067
Merge pull request #46 from tomerk/branch-2.0
shivaram Aug 29, 2016
783a075
Apply --additional-tags to EBS volumes
ajohnson-inst Sep 7, 2016
06f5d2b
Merge pull request #48 from aaronj1331/tag-volumes
shivaram Sep 7, 2016
bd25efa
Add Spark 2.0.1 to valid spark versions.
lagerspetz Oct 9, 2016
78280cb
Added also Spark 1.6.2
lagerspetz Oct 10, 2016
81a5aeb
Merge pull request #62 from lagerspetz/branch-2.0
shivaram Oct 10, 2016
bafa07c
Add missing 1.6.1 and new 1.6.3 and 2.0.2
lagerspetz Nov 15, 2016
a3e1d7b
Merge pull request #73 from lagerspetz/202
shivaram Nov 15, 2016
5188c78
Fix missing close quote
shivaram Nov 16, 2016
fcbe85f
Get rid of useless mount flag
Jan 10, 2017
9314296
Merge pull request #79 from dud225/branch-2.0
shivaram Jan 11, 2017
045507a
add missing spark_version 2.1.0
shiyuangu Feb 20, 2017
7af4f6d
Merge pull request #87 from shiyuangu/branch-2.0
shivaram Feb 22, 2017
697e802
Added Spark 2.1.1.
lagerspetz Jun 21, 2017
e6c4e09
Merge pull request #104 from lagerspetz/add-210
shivaram Jun 21, 2017
d9c9326
Updates for spark 2.2.0 - use scala 2.11.x
dazza-codes Aug 16, 2017
f21fb92
Updates for spark 2.2.0 - update versions and use this spark-ec2 repo…
dazza-codes Aug 16, 2017
435d06b
Updates for spark 2.2.0 - update spark-hadhoop download dependency to…
dazza-codes Aug 16, 2017
c84a77b
Update to java-1.8.0 and use {{java_home}} consistently in templates
dazza-codes Aug 16, 2017
535d070
Update hadoop to 2.7.4
dazza-codes Aug 16, 2017
9f613cc
fu Update to java-1.8.0
dazza-codes Aug 16, 2017
0f8a427
HADOOP/MAPREDUCE/TACHYON - removed entirely
dazza-codes Aug 16, 2017
d81a90e
Set the default java to java-1.8.0
dazza-codes Aug 16, 2017
6914816
spark-standalone/setup.sh - remove old spark versions code
dazza-codes Aug 16, 2017
2a26325
Modify setup scripts to ensure the java changes occur etc.
dazza-codes Aug 17, 2017
e5b2c2e
Update scala to 2.11.11, download from lightbend
dazza-codes Aug 17, 2017
eaf78b9
spark_ec2.py - remove tachyon
dazza-codes Aug 17, 2017
df433ad
spark_ec2.py - remove some things not specific to spark
dazza-codes Aug 17, 2017
b401059
deploy_templates.py - remove hadoop, mapreduce, and tachyon
dazza-codes Aug 17, 2017
ea22f15
setup-tools.sh - quietly and efficiently
dazza-codes Aug 17, 2017
25bfa18
setup.sh - use a full path to setup-slave script
dazza-codes Aug 17, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 13 additions & 20 deletions create_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,19 @@ if [ "$(id -u)" != "0" ]; then
fi

# Dev tools
sudo yum install -y java-1.7.0-openjdk-devel gcc gcc-c++ ant git
sudo yum install -y gcc gcc-c++ ant git

# Install java-8 for Spark 2.2.x
sudo yum install -y java-1.8.0 java-1.8.0-devel
sudo /usr/sbin/alternatives --set java /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java
sudo /usr/sbin/alternatives --set javac /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/javac
#sudo yum remove java-1.7

# Perf tools
sudo yum install -y dstat iotop strace sysstat htop perf
sudo debuginfo-install -q -y glibc
sudo debuginfo-install -q -y kernel
sudo yum --enablerepo='*-debug*' install -q -y java-1.7.0-openjdk-debuginfo.x86_64
sudo yum --enablerepo='*-debug*' install -y java-1.8.0-openjdk-debuginfo

# PySpark and MLlib deps
sudo yum install -y python-matplotlib python-tornado scipy libgfortran
Expand All @@ -38,42 +45,28 @@ sudo sed -i 's/.*ephemeral.*//g' /etc/cloud/cloud.cfg
sudo sed -i 's/.*swap.*//g' /etc/cloud/cloud.cfg

echo "mounts:" >> /etc/cloud/cloud.cfg
echo " - [ ephemeral0, /mnt, auto, \"defaults,noatime,nodiratime\", "\
echo " - [ ephemeral0, /mnt, auto, \"defaults,noatime\", "\
"\"0\", \"0\" ]" >> /etc/cloud.cloud.cfg

for x in {1..23}; do
echo " - [ ephemeral$x, /mnt$((x + 1)), auto, "\
"\"defaults,noatime,nodiratime\", \"0\", \"0\" ]" >> /etc/cloud/cloud.cfg
"\"defaults,noatime\", \"0\", \"0\" ]" >> /etc/cloud/cloud.cfg
done

# Install Maven (for Hadoop)
# Install Maven
cd /tmp
wget "http://archive.apache.org/dist/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz"
tar xvzf apache-maven-3.2.3-bin.tar.gz
mv apache-maven-3.2.3 /opt/

# Edit bash profile
echo "export PS1=\"\\u@\\h \\W]\\$ \"" >> ~/.bash_profile
echo "export JAVA_HOME=/usr/lib/jvm/java-1.7.0" >> ~/.bash_profile
echo "export JAVA_HOME=/usr/lib/jvm/java-1.8.0" >> ~/.bash_profile
echo "export M2_HOME=/opt/apache-maven-3.2.3" >> ~/.bash_profile
echo "export PATH=\$PATH:\$M2_HOME/bin" >> ~/.bash_profile

source ~/.bash_profile

# Build Hadoop to install native libs
sudo mkdir /root/hadoop-native
cd /tmp
sudo yum install -y protobuf-compiler cmake openssl-devel
wget "http://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz"
tar xvzf hadoop-2.4.1-src.tar.gz
cd hadoop-2.4.1-src
mvn package -Pdist,native -DskipTests -Dtar
sudo mv hadoop-dist/target/hadoop-2.4.1/lib/native/* /root/hadoop-native

# Install Snappy lib (for Hadoop)
yum install -y snappy
ln -sf /usr/lib64/libsnappy.so.1 /root/hadoop-native/.

# Create /usr/bin/realpath which is used by R to find Java installations
# NOTE: /usr/bin/realpath is missing in CentOS AMIs. See
# http://superuser.com/questions/771104/usr-bin-realpath-not-found-in-centos-6-5
Expand Down
8 changes: 4 additions & 4 deletions deploy.generic/root/spark-ec2/ec2-variables.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@
# These variables are automatically filled in by the spark-ec2 script.
export MASTERS="{{master_list}}"
export SLAVES="{{slave_list}}"
export HDFS_DATA_DIRS="{{hdfs_data_dirs}}"
export MAPRED_LOCAL_DIRS="{{mapred_local_dirs}}"
#export HDFS_DATA_DIRS="{{hdfs_data_dirs}}"
#export MAPRED_LOCAL_DIRS="{{mapred_local_dirs}}"
export SPARK_LOCAL_DIRS="{{spark_local_dirs}}"
export MODULES="{{modules}}"
export SPARK_VERSION="{{spark_version}}"
export TACHYON_VERSION="{{tachyon_version}}"
export HADOOP_MAJOR_VERSION="{{hadoop_major_version}}"
#export TACHYON_VERSION="{{tachyon_version}}"
#export HADOOP_MAJOR_VERSION="{{hadoop_major_version}}"
export SWAP_MB="{{swap}}"
export SPARK_WORKER_INSTANCES="{{spark_worker_instances}}"
export SPARK_MASTER_OPTS="{{spark_master_opts}}"
Expand Down
8 changes: 0 additions & 8 deletions deploy_templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,6 @@
else:
slave_ram_mb = max(512, slave_ram_mb - 1300) # Leave 1.3 GB RAM

# Make tachyon_mb as slave_ram_mb for now.
tachyon_mb = slave_ram_mb

worker_instances_str = ""
worker_cores = slave_cpus

Expand All @@ -63,18 +60,13 @@
"master_list": os.getenv("MASTERS"),
"active_master": os.getenv("MASTERS").split("\n")[0],
"slave_list": os.getenv("SLAVES"),
"hdfs_data_dirs": os.getenv("HDFS_DATA_DIRS"),
"mapred_local_dirs": os.getenv("MAPRED_LOCAL_DIRS"),
"spark_local_dirs": os.getenv("SPARK_LOCAL_DIRS"),
"spark_worker_mem": "%dm" % slave_ram_mb,
"spark_worker_instances": worker_instances_str,
"spark_worker_cores": "%d" % worker_cores,
"spark_master_opts": os.getenv("SPARK_MASTER_OPTS", ""),
"spark_version": os.getenv("SPARK_VERSION"),
"tachyon_version": os.getenv("TACHYON_VERSION"),
"hadoop_major_version": os.getenv("HADOOP_MAJOR_VERSION"),
"java_home": os.getenv("JAVA_HOME"),
"default_tachyon_mem": "%dMB" % tachyon_mb,
"system_ram_mb": "%d" % system_ram_mb,
"aws_access_key_id": os.getenv("AWS_ACCESS_KEY_ID"),
"aws_secret_access_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
Expand Down
50 changes: 0 additions & 50 deletions ephemeral-hdfs/init.sh

This file was deleted.

26 changes: 0 additions & 26 deletions ephemeral-hdfs/setup-slave.sh

This file was deleted.

49 changes: 0 additions & 49 deletions ephemeral-hdfs/setup.sh

This file was deleted.

23 changes: 0 additions & 23 deletions mapreduce/init.sh

This file was deleted.

11 changes: 0 additions & 11 deletions mapreduce/setup.sh

This file was deleted.

49 changes: 0 additions & 49 deletions persistent-hdfs/init.sh

This file was deleted.

8 changes: 0 additions & 8 deletions persistent-hdfs/setup-slave.sh

This file was deleted.

22 changes: 0 additions & 22 deletions persistent-hdfs/setup.sh

This file was deleted.

29 changes: 0 additions & 29 deletions rstudio/init.sh

This file was deleted.

Loading