Skip to content

Latest commit

 

History

History
68 lines (46 loc) · 1.76 KB

README.md

File metadata and controls

68 lines (46 loc) · 1.76 KB

Referece work

https://medium.com/@SaphE/testing-apache-spark-locally-docker-compose-and-kubernetes-deployment-94d35a54f222

run spark cluster

Spark cluster with:

  • Dynamic workers provisioning via scale of docker compose
  • History server to allow log analyses after job finish

create network

docker network create spark_dev_net

build local image

docker compose build

setup path for applications logs

aplication must generate logs into volume mapped used by history server then you can see all spark statistics even after job finish.

start up cluster

docker compose up

scale up to workers

docker compose up --scale spark-worker=2

execute app simple1

local threaded execution

spark-submit  main.py 

execution in local cluster

spark-submit --properties-file conf/spark-defaults.conf --packages org.apache.spark:spark-hadoop-cloud_2.12:3.3.4,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262 --master spark://192.168.122.1:7077 main.py

execution in local cluster (not enough resource) stay waiting forever

spark-submit --master spark://192.168.15.6:7077 --executor-memory 10G --driver-memory 10G  main.py 

execute simple1_minio

execution local

spark-submit --packages org.apache.spark:spark-hadoop-cloud_2.12:3.3.4,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262 main.py

execution local cluster

spark-submit --properties-file conf/spark-defaults.conf --packages org.apache.spark:spark-hadoop-cloud_2.12:3.3.4,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262 --master spark://192.168.122.1:7077 main.py