[CI/Build] Modify Dockerfile build for ARM64 & GH200 #11302

cennn · 2024-12-18T13:30:20Z

From PR: 10499,11212
Fix Issue: 2021

The Dockerfile build process for ARM64 systems is regarded as a build specific to a particular PyTorch version. Before initiating the build, the use_existing_pytorch.py script is employed. Additionally, torch and torchvision are installed from the nightly build prior to installing any other requirements. This approach maximally avoids the overwrite of torch by other packages listed in the requirements. Besides, this step enhances the consistency between the build executed within the docker environment and the user's manual installation from the source code.

The following command was utilized for the building and has been verified.
python3 use_existing_torch.py && docker build . --target vllm-openai --platform "linux/arm64" -t cenncenn/vllm-gh200-openai:v0.6.4.post1 --build-arg max_jobs=66 --build-arg nvcc_threads=2 --build-arg torch_cuda_arch_list="9.0+PTX" --build-arg vllm_fa_cmake_gpu_arches="90-real" --build-arg RUN_WHEEL_CHECK='false'

The following tutorial was employed for building from the source code and has been verified.
use-an-existing-pytorch-installation

The changes have been tested on the Nvidia GH200 platform with models meta-llama/Llama-3.1-8B and Qwen/Qwen2.5-0.5B-Instruct.

Signed-off-by: drikster80 <[email protected]>

…/causal-conv1d/mamba/flashinfer/bitsandbytes

…vision

# Conflicts: # docs/source/serving/deploying_with_docker.rst

…l && rm arm64 platform condition in requirements && add python3 use_existing_torch.py before build in run-gh200-test.sh

…& outlines == 0.1.11 # Requires pytorch && rm torch install before docker build

github-actions · 2024-12-18T13:30:34Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

…nsors requires pytorch

…pdate image size

youkaichao · 2024-12-20T00:50:50Z

close as we are testing it in #11351

drikster80 and others added 16 commits November 20, 2024 21:37

Update Docker for aarch64 builds

d2cbe42

Signed-off-by: drikster80 <[email protected]>

Update docs for arm64 docker builds & GH200 example

29ed358

Signed-off-by: drikster80 <[email protected]>

Fix FLASHINFER not installing by default on x86_64

f2635a4

Signed-off-by: drikster80 <[email protected]>

Removed build of flash-attn. Added libnccl2

ccef455

simplified Dockerfile build process for ARM64 systems

8b3c0b3

bitsandbytes>=0.45.0

69bb567

Resolve merge conflict in Dockerfile

600b805

rm needless kmod/libnccl2 && rm needless build from source for triton…

d5d2911

…/causal-conv1d/mamba/flashinfer/bitsandbytes

update Memory usage && rename image && pinned version for torch torch…

075ab9d

…vision

Merge remote-tracking branch 'origin/main'

0ddc464

# Conflicts: # docs/source/serving/deploying_with_docker.rst

rm requirements-cuda-arm64.txt && rm redundant torch install/uninstal…

34e7140

…l && rm arm64 platform condition in requirements && add python3 use_existing_torch.py before build in run-gh200-test.sh

fix docs --platform "linux/arm64"

c65bf72

rm docs blanks

6bddcd4

add torch nightly build in run-gh200-test.sh

6b80175

edit doc note && rm build-arg max_jobs nvcc_threads

93b82d7

add args max_jobs nvcc_threads to speed up && install pytorch first &…

d026489

…& outlines == 0.1.11 # Requires pytorch && rm torch install before docker build

mergify bot added documentation Improvements or additions to documentation ci/build labels Dec 18, 2024

cennn added 4 commits December 18, 2024 22:24

update vllm-gh200-openai Image size: 3.23GB

fb1759e

add debug "python3 -m pip list | grep torch || true" && compressed-te…

62c8e29

…nsors requires pytorch

rm "python3 -m pip list | grep torch || true" && add cloudpickle && u…

dd483d9

…pdate image size

rm sudo in doc

022eaf9

cennn changed the title ~~WIP: [CI/Build] Modify Dockerfile build for ARM64 & GH200~~ [CI/Build] Modify Dockerfile build for ARM64 & GH200 Dec 19, 2024

youkaichao mentioned this pull request Dec 20, 2024

[ci][gh200] dockerfile clean up #11351

Merged

youkaichao closed this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/Build] Modify Dockerfile build for ARM64 & GH200 #11302

[CI/Build] Modify Dockerfile build for ARM64 & GH200 #11302

cennn commented Dec 18, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 18, 2024

youkaichao commented Dec 20, 2024

[CI/Build] Modify Dockerfile build for ARM64 & GH200 #11302

[CI/Build] Modify Dockerfile build for ARM64 & GH200 #11302

Conversation

cennn commented Dec 18, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 18, 2024

youkaichao commented Dec 20, 2024

cennn commented Dec 18, 2024 •

edited by github-actions bot

Loading