You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying several different docker base - merlin / tensorflow / rapidsai / nvidia... but I kept fail to avoid version issue & driver issue.
I'm new to docker and airflow, and seems like I'm keep getting confused.
I will appreciate if anyone can see my requirements & dockerfile to see what am I doing wrong.
Details
I would like to build a docker image that will be used in airflow to train model.
Here is the setup I would like to mimick: when I tested initially on colab notebook, everything worked well with this setup:
Python Version: 3.10
Driver:
CUDA - 11.8
CUDNN - 8
Libraries:
RUN pip install tensorflow==2.12.0
RUN pip install --no-cache-dir \
--extra-index-url=https://pypi.nvidia.com \
cudf-cu11==23.12.* dask-cudf-cu11==23.12.*
RUN pip install -U git+https://github.com/NVIDIA-Merlin/models==23.04
RUN pip install -U git+https://github.com/NVIDIA-Merlin/[email protected]
RUN pip install -U git+https://github.com/NVIDIA-Merlin/[email protected]
RUN pip install -U git+https://github.com/NVIDIA-Merlin/[email protected]
RUN pip install merlin-systems==23.04
RUN pip install tf2onnx==1.15.1
dockerfile that worked best so far, but failed due to cudf issue:
FROM --platform=linux/amd64 tensorflow/tensorflow:2.12.0-gpu as prod
WORKDIR /ads_content
COPY ./data-airflow .
COPY ./ads/images/requirements.txt .
RUN apt-get update && yes|apt-get upgrade
# Add sudo
RUN apt-get -y install sudo
# Adding wget and bzip2
RUN apt-get install -y wget bzip2
RUN apt-get install -y build-essential libssl-dev zlib1g-dev libbz2-dev libffi-dev gcc-x86-64-linux-gnu
WORKDIR /root
# Set requirements
RUN pip install --upgrade pip
RUN apt-get install -y git
#RAPIDs
RUN pip install --no-cache-dir \
--extra-index-url=https://pypi.nvidia.com \
cudf-cu11==23.4.* dask-cudf-cu11==23.4*
RUN pip install -U git+https://github.com/NVIDIA-Merlin/[email protected]
RUN pip install -U git+https://github.com/NVIDIA-Merlin/[email protected]
RUN pip install -U git+https://github.com/NVIDIA-Merlin/[email protected]
RUN pip install -U git+https://github.com/NVIDIA-Merlin/[email protected]
RUN pip install merlin-systems==23.04
RUN pip install tf2onnx==1.15.1
RUN pip install -r /ads_content/requirements.txt
WORKDIR /ads_content
ENTRYPOINT ["python3"]
Problem of this image was that version of cudf is limited to 23.04, as this image uses Python 3.8 (which is not supporting cudf 23.12). This low version of cudf blocked me from using gpu on data processing.
Other base images, like rapidsai / merlin, I'm always experiencing driver issue.
I see that merlin is recommended to use with image nvcr.io/nvidia/merlin/merlin-tensorflow:23.06.
Can someone share me an example docker file, on how to use existing container from merlin / or somewhere else to train tensorflow model successfully on airflow?
The text was updated successfully, but these errors were encountered:
❓ Questions & Help
I'm trying several different docker base - merlin / tensorflow / rapidsai / nvidia... but I kept fail to avoid version issue & driver issue.
I'm new to docker and airflow, and seems like I'm keep getting confused.
I will appreciate if anyone can see my requirements & dockerfile to see what am I doing wrong.
Details
dockerfile that worked best so far, but failed due to cudf issue:
Problem of this image was that version of cudf is limited to 23.04, as this image uses Python 3.8 (which is not supporting cudf 23.12). This low version of cudf blocked me from using gpu on data processing.
Other base images, like rapidsai / merlin, I'm always experiencing driver issue.
I see that merlin is recommended to use with image nvcr.io/nvidia/merlin/merlin-tensorflow:23.06.
Can someone share me an example docker file, on how to use existing container from merlin / or somewhere else to train tensorflow model successfully on airflow?
The text was updated successfully, but these errors were encountered: