Spark cluster with:
- Dynamic workers provisioning via scale of docker compose
- History server to allow log analyses after job finish
docker network create spark_dev_net
docker compose build
aplication must generate logs into volume mapped used by history server then you can see all spark statistics even after job finish.
docker compose up
docker compose up --scale spark-worker=2
spark-submit main.py
spark-submit --properties-file conf/spark-defaults.conf --packages org.apache.spark:spark-hadoop-cloud_2.12:3.3.4,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262 --master spark://192.168.122.1:7077 main.py
spark-submit --master spark://192.168.15.6:7077 --executor-memory 10G --driver-memory 10G main.py
spark-submit --packages org.apache.spark:spark-hadoop-cloud_2.12:3.3.4,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262 main.py
spark-submit --properties-file conf/spark-defaults.conf --packages org.apache.spark:spark-hadoop-cloud_2.12:3.3.4,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262 --master spark://192.168.122.1:7077 main.py