Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6067] Add docker-compose file for running with Spark #4820

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,3 +143,6 @@ tramp

# pyenv file
.python-version

# Spark Binary File
scripts/docker/zeppelin-quick-start/spark-*
82 changes: 80 additions & 2 deletions scripts/docker/zeppelin-quick-start/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@
Please go to [install](https://www.docker.com/) to install Docker.

### Zeppelin Only
#### Run docker-compose
#### Run docker compose
```bash
docker compose -f docker-compose-zeppelin-only.yml up
```

#### Stop docker-compose
#### Stop docker compose
```bash
docker compose -f docker-compose-zeppelin-only.yml stop
```
Expand All @@ -27,3 +27,81 @@ docker compose -f docker-compose-zeppelin-only.yml stop
ZEPPELIN_JMX_PORT: 9996 # Port number which JMX uses
ZEPPELIN_MEM: -Xmx1024m -XX:MaxMetaspaceSize=512m # JVM mem options
```

### Zeppelin With Apache Spark
#### Install Spark Binary File
```bash
cd scripts/docker/zeppelin-quick-start
wget https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz
tar -xvf spark-3.5.2-bin-hadoop3.tgz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to install spark out side of the container?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, you mount the Spark binary into the container later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793
I used the mounting method because I wanted to recommend running it according to the Spark version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793 ping :)

```

#### Run docker compose
```bash
docker compose -f docker-compose-with-spark.yml up
```

#### Stop docker compose
```bash
docker compose -f docker-compose-with-spark.yml stop
```

#### Example
```
%spark.conf

SPARK_HOME /opt/spark
spark.master spark://spark-master:7077
```
```
%spark

val sdf = spark.createDataFrame(Seq((0, "park", 13, 70, "Korea"), (1, "xing", 14, 80, "China"), (2, "john", 15, 90, "USA"))).toDF("id", "name", "age", "score", "country")
sdf.printSchema
sdf.show()
```
```
root
|-- id: integer (nullable = false)
|-- name: string (nullable = true)
|-- age: integer (nullable = false)
|-- score: integer (nullable = false)
|-- country: string (nullable = true)

+---+----+---+-----+-------+
| id|name|age|score|country|
+---+----+---+-----+-------+
| 0|park| 13| 70| Korea|
| 1|xing| 14| 80| China|
| 2|john| 15| 90| USA|
+---+----+---+-----+-------+
```

### Apache Spark Environment Variables
Please check the [link](https://github.com/bitnami/containers/blob/main/bitnami/spark/README.md) for more details

```dockerfile
spark-master:
environment:
SPARK_MODE: master # Spark cluster mode to run (can be master or worker)
SPARK_RPC_AUTHENTICATION_ENABLED: no # Enable RPC authentication
SPARK_RPC_ENCRYPTION_ENABLED: no # Enable RPC encryption
SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED: no # Enable local storage encryption
SPARK_SSL_ENABLED: no # Enable SSL configuration
SPARK_USER: spark # Spark user
SPARK_MASTER_PORT: 7077
SPARK_MASTER_WEBUI_PORT: 8080

spark-worker:
environment:
SPARK_MODE: worker # Spark cluster mode to run (can be master or worker)
SPARK_MASTER_URL: spark://spark-master:7077 # Url where the worker can find the master. Only needed when spark mode is worker.
SPARK_WORKER_MEMORY: 2G
SPARK_WORKER_CORES: 2
SPARK_RPC_AUTHENTICATION_ENABLED: no # Enable RPC authentication
SPARK_RPC_ENCRYPTION_ENABLED: no # Enable RPC encryption
SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED: no # Enable local storage encryption
SPARK_SSL_ENABLED: no # Enable SSL configuration
SPARK_USER: spark # Spark user
SPARK_WORKER_WEBUI_PORT: 8081
```
66 changes: 66 additions & 0 deletions scripts/docker/zeppelin-quick-start/docker-compose-with-spark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
services:
zeppelin:
hostname: zeppelin
container_name: zeppelin
image: docker.io/apache/zeppelin:0.11.2
ports:
- "8080:8080"
environment:
ZEPPELIN_PORT: 8080
ZEPPELIN_MEM: -Xmx1024m -XX:MaxMetaspaceSize=512m
SPARK_HOME: /opt/spark
volumes:
- ./spark-3.5.2-bin-hadoop3:/opt/spark

spark-master:
hostname: spark-master
container_name: spark-master
image: docker.io/bitnami/spark:3.5.2
ports:
- "18080:8080"
- "7077:7077"
environment:
SPARK_MODE: master
SPARK_RPC_AUTHENTICATION_ENABLED: no
SPARK_RPC_ENCRYPTION_ENABLED: no
SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED: no
SPARK_SSL_ENABLED: no
SPARK_USER: spark
SPARK_MASTER_PORT: 7077
SPARK_MASTER_WEBUI_PORT: 8080

spark-worker:
hostname: spark-worker
container_name: spark-worker
image: docker.io/bitnami/spark:3.5.2
ports:
- "18081:8081"
environment:
SPARK_MODE: worker
SPARK_MASTER_URL: spark://spark-master:7077
SPARK_WORKER_MEMORY: 2G
SPARK_WORKER_CORES: 2
SPARK_RPC_AUTHENTICATION_ENABLED: no
SPARK_RPC_ENCRYPTION_ENABLED: no
SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED: no
SPARK_SSL_ENABLED: no
SPARK_USER: spark
SPARK_WORKER_WEBUI_PORT: 8081
depends_on:
- spark-master
Loading