|
1 | 1 | # Deep Learning Inference on Spark
|
2 | 2 |
|
3 |
| -Example notebooks demonstrating **distributed deep learning inference** using the [predict_batch_udf](https://developer.nvidia.com/blog/distributed-deep-learning-made-easy-with-spark-3-4/) introduced in Spark 3.4.0. |
| 3 | +Example notebooks demonstrating **distributed deep learning inference** using the [predict_batch_udf](https://developer.nvidia.com/blog/distributed-deep-learning-made-easy-with-spark-3-4/#distributed_inference) introduced in Spark 3.4.0. |
4 | 4 | These notebooks also demonstrate integration with [Triton Inference Server](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html), an open-source, GPU-accelerated serving solution for DL.
|
5 | 5 |
|
6 | 6 | ## Contents:
|
@@ -63,13 +63,13 @@ Each notebook has a suffix `_torch` or `_tf` specifying the environment used.
|
63 | 63 |
|
64 | 64 | **For PyTorch:**
|
65 | 65 | ```
|
66 |
| -conda create -n spark-dl-torch python=3.11 |
| 66 | +conda create -n spark-dl-torch -c conda-forge python=3.11 |
67 | 67 | conda activate spark-dl-torch
|
68 | 68 | pip install -r torch_requirements.txt
|
69 | 69 | ```
|
70 | 70 | **For TensorFlow:**
|
71 | 71 | ```
|
72 |
| -conda create -n spark-dl-tf python=3.11 |
| 72 | +conda create -n spark-dl-tf -c conda-forge python=3.11 |
73 | 73 | conda activate spark-dl-tf
|
74 | 74 | pip install -r tf_requirements.txt
|
75 | 75 | ```
|
@@ -99,12 +99,21 @@ The notebooks are ready to run! Each notebook has a cell to connect to the stand
|
99 | 99 | - `requirements.txt` installs pyspark>=3.4.0. Make sure the installed PySpark version is compatible with your system's Spark installation.
|
100 | 100 | - The notebooks require a GPU environment for the executors.
|
101 | 101 | - The PyTorch notebooks include model compilation and accelerated inference with TensorRT. While not included in the notebooks, Tensorflow also supports [integration with TensorRT](https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html), but as of writing it is not supported in TF==2.17.0.
|
| 102 | +- Note that some Huggingface models may be gated and will require a login, e.g.,: |
| 103 | + ```python |
| 104 | + from huggingface_hub import login |
| 105 | + login() |
| 106 | + ``` |
102 | 107 |
|
103 | 108 | **Troubleshooting:**
|
104 | 109 | If you encounter issues starting the Triton server, you may need to link your libstdc++ file to the conda environment, e.g.:
|
105 | 110 | ```shell
|
106 | 111 | ln -sf /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ${CONDA_PREFIX}/lib/libstdc++.so.6
|
107 | 112 | ```
|
| 113 | +If the issue persists with the message `libstdc++.so.6: version 'GLIBCXX_3.4.30' not found`, you may need to update libstdc++ in your conda environment: |
| 114 | +```shell |
| 115 | +conda install -c conda-forge libstdcxx-ng |
| 116 | +``` |
108 | 117 |
|
109 | 118 | ## Running on Cloud Environments
|
110 | 119 |
|
|
0 commit comments