You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support for Spark DL notebooks with PyTriton on Databricks/Dataproc (#483)
### Support for running DL Inference notebooks on CSP environments.
- Refactored Triton sections to use PyTriton, a Python API for the
Triton inference server which avoids Docker. Once this PR is merged,
Triton sections no longer need to be skipped in the CI pipeline
@YanxuanLiu .
- Updated notebooks with instructions to run on Databricks/Dataproc
- Updated Torch notebooks with best practices for ahead-of-time TensorRT
compilation.
- Cleaned up README, removing instructions to start Jupyter with PySpark
(we need a cell to attach to standalone for CI/CD anyway, so hoping to
reduce confusion for user).
Notebook outputs are saved from running locally, but all notebooks were
tested on Databricks/Dataproc.
---------
Signed-off-by: Rishi Chandra <[email protected]>
Example notebooks for the [predict_batch_udf](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.functions.predict_batch_udf.html#pyspark.ml.functions.predict_batch_udf) function introduced in Spark 3.4.
3
+
Example notebooks demonstrating **distributed deep learning inference** using the [predict_batch_udf](https://developer.nvidia.com/blog/distributed-deep-learning-made-easy-with-spark-3-4/) introduced in Spark 3.4.0.
4
+
These notebooks also demonstrate integration with [Triton Inference Server](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html), an open-source, GPU-accelerated serving solution for DL.
4
5
5
-
## Overview
6
+
## Contents:
7
+
-[Overview](#overview)
8
+
-[Running Locally](#running-locally)
9
+
-[Running on Cloud](#running-on-cloud-environments)
10
+
-[Integration with Triton Inference Server](#inference-with-triton)
6
11
7
-
This directory contains notebooks for each DL framework (based on their own published examples). The goal is to demonstrate how models trained and saved on single-node machines can be easily used for parallel inferencing on Spark clusters.
12
+
## Overview
8
13
14
+
These notebooks demonstrate how models from external frameworks (Torch, Huggingface, Tensorflow) trained on single-worker machines can be used for large-scale distributed inference on Spark clusters.
9
15
For example, a basic model trained in TensorFlow and saved on disk as "mnist_model" can be used in Spark as follows:
In this simple case, the `predict_batch_fn` will use TensorFlow APIs to load the model and return a simple `predict` function which operates on numpy arrays. The `predict_batch_udf` will automatically convert the Spark DataFrame columns to the expected numpy inputs.
37
+
In this simple case, the `predict_batch_fn` will use TensorFlow APIs to load the model and return a simple `predict` function. The `predict_batch_udf` will handle the data conversion from Spark DataFrame columns into batched numpy inputs.
38
+
32
39
33
-
All notebooks have been saved with sample outputs for quick browsing.
34
-
Here is a full list of the notebooks with their published example links:
40
+
#### Notebook List
35
41
36
-
| | Category | Notebook Name | Description | Link
42
+
Below is a full list of the notebooks with links to the examples they are based on. All notebooks have been saved with sample outputs for quick browsing.
43
+
44
+
| | Framework | Notebook Name | Description | Link
| 1 | PyTorch | Image Classification | Training a model to predict clothing categories in FashionMNIST, including accelerated inference with Torch-TensorRT. | [Link](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html)
39
-
| 2 | PyTorch | Regression | Training a model to predict housing prices in the California Housing Dataset, including accelerated inference with Torch-TensorRT. | [Link](https://github.com/christianversloot/machine-learning-articles/blob/main/how-to-create-a-neural-network-for-regression-with-pytorch.md)
47
+
| 2 | PyTorch | Housing Regression | Training a model to predict housing prices in the California Housing Dataset, including accelerated inference with Torch-TensorRT. | [Link](https://github.com/christianversloot/machine-learning-articles/blob/main/how-to-create-a-neural-network-for-regression-with-pytorch.md)
40
48
| 3 | Tensorflow | Image Classification | Training a model to predict hand-written digits in MNIST. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/save_and_load.ipynb)
41
-
| 4 | Tensorflow | Feature Columns | Training a model with preprocessing layers to predict likelihood of pet adoption in the PetFinder mini dataset. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/structured_data/preprocessing_layers.ipynb)
42
-
| 5 | Tensorflow | Keras Metadata | Training ResNet-50 to perform flower recognition on Databricks. | [Link](https://docs.databricks.com/en/_extras/notebooks/source/deep-learning/keras-metadata.html)
49
+
| 4 | Tensorflow | Keras Preprocessing | Training a model with preprocessing layers to predict likelihood of pet adoption in the PetFinder mini dataset. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/structured_data/preprocessing_layers.ipynb)
50
+
| 5 | Tensorflow | Keras Resnet50 | Training ResNet-50 to perform flower recognition from flower images. | [Link](https://docs.databricks.com/en/_extras/notebooks/source/deep-learning/keras-metadata.html)
43
51
| 6 | Tensorflow | Text Classification | Training a model to perform sentiment analysis on the IMDB dataset. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/text_classification.ipynb)
44
-
| 7+8 | HuggingFace | Conditional Generation | Sentence translation using the T5 text-to-text transformer, with notebooks demoing both Torch and Tensorflow. | [Link](https://huggingface.co/docs/transformers/model_doc/t5#t5)
45
-
| 9+10 | HuggingFace | Pipelines | Sentiment analysis using Huggingface pipelines, with notebooks demoing both Torch and Tensorflow. | [Link](https://huggingface.co/docs/transformers/quicktour#pipeline-usage)
46
-
| 11 | HuggingFace | Sentence Transformers | Sentence embeddings using the SentenceTransformers framework in Torch. | [Link](https://huggingface.co/sentence-transformers)
52
+
| 7+8 | HuggingFace | Conditional Generation | Sentence translation using the T5 text-to-text transformer for both Torch and Tensorflow. | [Link](https://huggingface.co/docs/transformers/model_doc/t5#t5)
53
+
| 9+10 | HuggingFace | Pipelines | Sentiment analysis using Huggingface pipelines for both Torch and Tensorflow. | [Link](https://huggingface.co/docs/transformers/quicktour#pipeline-usage)
54
+
| 11 | HuggingFace | Sentence Transformers | Sentence embeddings using SentenceTransformers in Torch. | [Link](https://huggingface.co/sentence-transformers)
47
55
48
-
## Running the Notebooks
56
+
## Running Locally
49
57
50
-
If you want to run the notebooks yourself, please follow these instructions.
51
-
52
-
**Notes**:
53
-
- The notebooks require a GPU environment for the executors.
54
-
- Please create separate environments for PyTorch and Tensorflow examples as specified below. This will avoid conflicts between the CUDA libraries bundled with their respective versions. The Huggingface examples will have a `_torch` or `_tf` suffix to specify the environment used.
55
-
- The PyTorch notebooks include model compilation and accelerated inference with TensorRT. While not included in the notebooks, Tensorflow also supports [integration with TensorRT](https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html), but may require downgrading the TF version.
56
-
- For demonstration purposes, these examples just use a local Spark Standalone cluster with a single executor, but you should be able to run them on any distributed Spark cluster.
58
+
To run the notebooks locally, please follow these instructions:
57
59
58
60
#### Create environment
59
61
62
+
Each notebook has a suffix `_torch` or `_tf` specifying the environment used.
63
+
60
64
**For PyTorch:**
61
65
```
62
66
conda create -n spark-dl-torch python=3.11
@@ -70,36 +74,57 @@ conda activate spark-dl-tf
70
74
pip install -r tf_requirements.txt
71
75
```
72
76
73
-
#### Launch Jupyter + Spark
77
+
#### Start Cluster
78
+
79
+
For demonstration, these instructions just use a local Standalone cluster with a single executor, but they can be run on any distributed Spark cluster. For cloud environments, see [below](#running-on-cloud-environments).
The notebooks are ready to run! Each notebook has a cell to connect to the standalone cluster and create a SparkSession.
92
96
93
-
# BROWSE to localhost:8888 to view/run notebooks
97
+
**Notes**:
98
+
- Please create separate environments for PyTorch and Tensorflow notebooks as specified above. This will avoid conflicts between the CUDA libraries bundled with their respective versions.
99
+
-`requirements.txt` installs pyspark>=3.4.0. Make sure the installed PySpark version is compatible with your system's Spark installation.
100
+
- The notebooks require a GPU environment for the executors.
101
+
- The PyTorch notebooks include model compilation and accelerated inference with TensorRT. While not included in the notebooks, Tensorflow also supports [integration with TensorRT](https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html), but as of writing it is not supported in TF==2.17.0.
We also provide instructions to run the notebooks on CSP Spark environments.
112
+
See the instructions for [Databricks](databricks/README.md) and [GCP Dataproc](dataproc/README.md).
113
+
114
+
## Inference with Triton
115
+
116
+
The notebooks also demonstrate integration with the [Triton Inference Server](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html), an open-source serving platform for deep learning models, which includes many [features and performance optimizations](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html#triton-major-features) to streamline inference.
117
+
The notebooks use [PyTriton](https://github.com/triton-inference-server/pytriton), a Flask-like Python framework that handles communication with the Triton server.
The example notebooks also demonstrate integration with [Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server), an open-source, GPU-accelerated serving solution for DL.
121
+
The diagram above shows how Spark distributes inference tasks to run on the Triton Inference Server, with PyTriton handling request/response communication with the server.
102
122
103
-
**Note**: Some examples may require special configuration of server as highlighted in the notebooks.
123
+
The process looks like this:
124
+
- Distribute a PyTriton task across the Spark cluster, instructing each worker to launch a Triton server process.
125
+
- Use stage-level scheduling to ensure there is a 1:1 mapping between worker nodes and servers.
126
+
- Define a Triton inference function, which contains a client that binds to the local server on a given worker and sends inference requests.
127
+
- Wrap the Triton inference function in a predict_batch_udf to launch parallel inference requests using Spark.
128
+
- Finally, distribute a shutdown signal to terminate the Triton server processes on each worker.
104
129
105
-
**Note**: for demonstration purposes, the Triton Inference Server integrations just launch the server in a docker container on the local host, so you will need to [install docker](https://docs.docker.com/engine/install/) on your local host. Most real-world deployments will likely be hosted on remote machines.
130
+
For more information on how PyTriton works, see the [PyTriton docs](https://triton-inference-server.github.io/pytriton/latest/high_level_design/).
databricks workspace import $UTILS_DEST --format AUTO --file $UTILS_SRC
34
+
databricks workspace import $INIT_DEST --format AUTO --file $INIT_SRC
35
+
```
36
+
37
+
6. Launch the cluster with the provided script (note that the script specifies **Azure instances** by default; change as needed):
38
+
```shell
39
+
cd setup
40
+
chmod +x start_cluster.sh
41
+
./start_cluster.sh
42
+
```
43
+
44
+
OR, start the cluster from the Databricks UI:
45
+
46
+
- Go to `Compute > Create compute` and set the desired cluster settings.
47
+
- Integration with Triton inference server uses stage-level scheduling (Spark>=3.4.0). Make sure to:
48
+
- use a cluster with GPU resources
49
+
- set a value for`spark.executor.cores`
50
+
- ensure that `spark.executor.resource.gpu.amount` = 1
51
+
- Under `Advanced Options > Init Scripts`, upload the init script from your workspace.
52
+
- Under environment variables, set`FRAMEWORK=torch` or `FRAMEWORK=tf` based on the notebook used.
53
+
- For Tensorflow notebooks, we recommend setting the environment variable `TF_GPU_ALLOCATOR=cuda_malloc_async` (especially for Huggingface LLM models), which enables the CUDA driver to implicity release unused memory from the pool.
54
+
55
+
7. Navigate to the notebook in your workspace and attach it to the cluster. The default cluster name is `spark-dl-inference-$FRAMEWORK`.
Repeat this step for any notebooks you wish to run. All notebooks under `gs://${SPARK_DL_HOME}/notebooks/` will be copied to the master node during initialization.
5. Specify the framework to use (torch or tf), which will determine what libraries to install on the cluster. For example:
50
+
```shell
51
+
export FRAMEWORK=torch
52
+
```
53
+
Run the cluster startup script. The script will also retrieve and use the [spark-rapids initialization script](https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/spark-rapids/spark-rapids.sh) to setup GPU resources.
54
+
```shell
55
+
cd setup
56
+
chmod +x start_cluster.sh
57
+
./start_cluster.sh
58
+
```
59
+
By default, the script creates a 4 node GPU cluster named `${USER}-spark-dl-inference-${FRAMEWORK}`.
60
+
61
+
7. Browse to the Jupyter web UI:
62
+
- Go to `Dataproc` > `Clusters` > `(Cluster Name)` > `Web Interfaces` > `Jupyter/Lab`
63
+
64
+
Or, get the link by running this command (under httpPorts > Jupyter/Lab):
0 commit comments