Skip to content

Commit 4fd39d6

Browse files
committed
Merge branch 'branch-25.04' into micro-benchmark-log
2 parents 679ed24 + 1d76d6f commit 4fd39d6

18 files changed

+2177
-2602
lines changed

examples/ML+DL-Examples/Spark-DL/dl_inference/README.md

+12-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Deep Learning Inference on Spark
22

3-
Example notebooks demonstrating **distributed deep learning inference** using the [predict_batch_udf](https://developer.nvidia.com/blog/distributed-deep-learning-made-easy-with-spark-3-4/) introduced in Spark 3.4.0.
3+
Example notebooks demonstrating **distributed deep learning inference** using the [predict_batch_udf](https://developer.nvidia.com/blog/distributed-deep-learning-made-easy-with-spark-3-4/#distributed_inference) introduced in Spark 3.4.0.
44
These notebooks also demonstrate integration with [Triton Inference Server](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html), an open-source, GPU-accelerated serving solution for DL.
55

66
## Contents:
@@ -63,13 +63,13 @@ Each notebook has a suffix `_torch` or `_tf` specifying the environment used.
6363

6464
**For PyTorch:**
6565
```
66-
conda create -n spark-dl-torch python=3.11
66+
conda create -n spark-dl-torch -c conda-forge python=3.11
6767
conda activate spark-dl-torch
6868
pip install -r torch_requirements.txt
6969
```
7070
**For TensorFlow:**
7171
```
72-
conda create -n spark-dl-tf python=3.11
72+
conda create -n spark-dl-tf -c conda-forge python=3.11
7373
conda activate spark-dl-tf
7474
pip install -r tf_requirements.txt
7575
```
@@ -99,12 +99,21 @@ The notebooks are ready to run! Each notebook has a cell to connect to the stand
9999
- `requirements.txt` installs pyspark>=3.4.0. Make sure the installed PySpark version is compatible with your system's Spark installation.
100100
- The notebooks require a GPU environment for the executors.
101101
- The PyTorch notebooks include model compilation and accelerated inference with TensorRT. While not included in the notebooks, Tensorflow also supports [integration with TensorRT](https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html), but as of writing it is not supported in TF==2.17.0.
102+
- Note that some Huggingface models may be gated and will require a login, e.g.,:
103+
```python
104+
from huggingface_hub import login
105+
login()
106+
```
102107

103108
**Troubleshooting:**
104109
If you encounter issues starting the Triton server, you may need to link your libstdc++ file to the conda environment, e.g.:
105110
```shell
106111
ln -sf /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ${CONDA_PREFIX}/lib/libstdc++.so.6
107112
```
113+
If the issue persists with the message `libstdc++.so.6: version 'GLIBCXX_3.4.30' not found`, you may need to update libstdc++ in your conda environment:
114+
```shell
115+
conda install -c conda-forge libstdcxx-ng
116+
```
108117

109118
## Running on Cloud Environments
110119

examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ datasets==3.*
1212
transformers
1313
urllib3<2
1414
nvidia-pytriton
15-
torch
15+
torch<=2.5.1
1616
torchvision --extra-index-url https://download.pytorch.org/whl/cu121
1717
torch-tensorrt
1818
tensorrt --extra-index-url https://download.pytorch.org/whl/cu121

examples/ML+DL-Examples/Spark-DL/dl_inference/dataproc/setup/init_spark_dl.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ if [[ "${ROLE}" == 'Master' ]]; then
4949
if gsutil -q stat gs://${SPARK_DL_HOME}/notebooks/**; then
5050
mkdir spark-dl-notebooks
5151
gcloud storage cp -r gs://${SPARK_DL_HOME}/notebooks/* spark-dl-notebooks
52-
gcloud storage cp gs://${SPARK_DL_HOME}/pytriton_utils.py spark-dl-notebooks/
52+
gcloud storage cp gs://${SPARK_DL_HOME}/pytriton_utils.py .
5353
else
5454
echo "Failed to retrieve notebooks from gs://${SPARK_DL_HOME}/notebooks/"
5555
exit 1

examples/ML+DL-Examples/Spark-DL/dl_inference/dataproc/setup/start_cluster.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ urllib3<2
4949
nvidia-pytriton"
5050

5151
TORCH_REQUIREMENTS="${COMMON_REQUIREMENTS}
52-
torch
52+
torch<=2.5.1
5353
torchvision --extra-index-url https://download.pytorch.org/whl/cu121
5454
torch-tensorrt
5555
tensorrt --extra-index-url https://download.pytorch.org/whl/cu121

examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb

+133-158
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)