Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Installation]: vllm on NVIDIA jetson AGX orin #5640

Open
phgcha opened this issue Jun 18, 2024 · 16 comments
Open

[Installation]: vllm on NVIDIA jetson AGX orin #5640

phgcha opened this issue Jun 18, 2024 · 16 comments
Labels
installation Installation problems

Comments

@phgcha
Copy link

phgcha commented Jun 18, 2024

Your current environment

root@jetson:/workspace# python collect_env.py
  File "collect_env.py", line 724
    print(msg, file=sys.stderr)
                   ^
SyntaxError: invalid syntax
root@jetson:/workspace# python3 collect_env.py
Collecting environment information...
PyTorch version: 2.0.0a0+ec3941ad.nv23.02
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (aarch64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.25.2
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 14 2022, 12:59:47)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.10.120-tegra-aarch64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.315
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.6.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False

CPU:
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          12
On-line CPU(s) list:             0-7
Off-line CPU(s) list:            8-11
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       2
Vendor ID:                       ARM
Model:                           1
Model name:                      ARMv8 Processor rev 1 (v8l)
Stepping:                        r0p1
CPU max MHz:                     2201.6001
CPU min MHz:                     115.2000
BogoMIPS:                        62.50
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        2 MiB
L3 cache:                        4 MiB
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; CSV2, but not BHB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==2.0.0a0+ec3941ad.nv23.2
[pip3] torchaudio==0.13.1+b90d798
[pip3] torchvision==0.14.1a0+5e8e2f1
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

How you are installing vllm

pip install vllm

Hi,

I'm trying to install vllm on my Jetson AGX orin developer kit.

I'm using the following image: nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
and I get this error when I pip install vllm

root@jetson:/workspace# pip install vllm
Collecting vllm
  Downloading vllm-0.5.0.post1.tar.gz (743 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 743.2/743.2 kB 12.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      Traceback (most recent call last):
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 415, in <module>
        File "<string>", line 341, in get_vllm_version
      RuntimeError: Unknown runtime environment
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

Note the error message Unknown runtime environment
I figured that this is thrown here https://github.com/vllm-project/vllm/blob/main/setup.py#L347 due to torch.version.cuda being none

However, when I prompt python3 and try verifying the cuda availability,

root@jetson:/workspace# python3
Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.version.cuda)
11.4

Any help would be appreciated. Thanks

@phgcha phgcha added the installation Installation problems label Jun 18, 2024
@youkaichao
Copy link
Member

I suggest you contact your support team. You are using a custom built pytorch, which we can answer very limited questions about.

@phgcha
Copy link
Author

phgcha commented Jun 18, 2024

Thanks, will do. But does anyone have rough idea what might have caused this?

@Shatatel
Copy link

Shatatel commented Jun 26, 2024

Same issue here
Appears right after fresh Jetson AGX flashing

here some sw info

jetson_release 

Model: NVIDIA Jetson AGX Orin Developer Kit - Jetpack 6.0 [L4T 36.3.0]
NV Power Mode[0]: MAXN
Hardware:
 - Module: NVIDIA Jetson AGX Orin
Platform:
 - Distribution: Ubuntu 22.04 Jammy Jellyfish
 - Release: 5.15.136-tegra
jtop:
 - Version: 4.2.8
 - Service: Active
Libraries:
 - CUDA: 12.2.140
 - cuDNN: 8.9.4.25
 - TensorRT: 8.6.2.3
 - VPI: 3.1.5
 - Vulkan: 1.3.204
 - OpenCV: 4.10.0-dev - with CUDA: YES


Python 3.10.12 [GCC 11.4.0] on linux
>>> import torch
>>> print(torch.version.cuda)
12.2

@Irrerwirrer
Copy link

Same problem here. Using print(torch.__version__), I found that current vllm tries to use PyTorch version 2.3.0 in setup.py . However, my currently installed version is 2.1.0a0+41361538.nv23.06, which is the latest supported by my Jetpack 5.1.2. According to the NVIDIA documentation, PyTorch 2.3.0 is not supported on my setup.

To work around this, I attempted to install vllm-0.2.4, which requires PyTorch only 2.1.0. However, I encountered an OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. error, even though I have already added the correct CUDA_HOME to my .bashrc

@Acc1143
Copy link

Acc1143 commented Jul 1, 2024

怎么解决

@KungFuPandaPro
Copy link

Same problem here. Using print(torch.__version__), I found that current vllm tries to use PyTorch version 2.3.0 in setup.py . However, my currently installed version is 2.1.0a0+41361538.nv23.06, which is the latest supported by my Jetpack 5.1.2. According to the NVIDIA documentation, PyTorch 2.3.0 is not supported on my setup.

To work around this, I attempted to install vllm-0.2.4, which requires PyTorch only 2.1.0. However, I encountered an OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. error, even though I have already added the correct CUDA_HOME to my .bashrc

how to install vllm on jetson?

@walker-ai
Copy link

Same problem here. Using print(torch.__version__), I found that current vllm tries to use PyTorch version 2.3.0 in setup.py . However, my currently installed version is 2.1.0a0+41361538.nv23.06, which is the latest supported by my Jetpack 5.1.2. According to the NVIDIA documentation, PyTorch 2.3.0 is not supported on my setup.
To work around this, I attempted to install vllm-0.2.4, which requires PyTorch only 2.1.0. However, I encountered an OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. error, even though I have already added the correct CUDA_HOME to my .bashrc

how to install vllm on jetson?

Did you resolve that?

@walker-ai
Copy link

@youkaichao Hey, I met the same problem, could you tell me is it feasible to run on Jetson AGX Orin (aarch64) even if I only call some classes (configuration classes and operators) from it? If so, how should I install it?

@Irrerwirrer
Copy link

@KungFuPandaPro @walker-ai
I have managed to install it properly.
Take a look at this thread/issue
#6063

@walker-ai
Copy link

@KungFuPandaPro @walker-ai I have managed to install it properly. Take a look at this thread/issue #6063

Thank you for your time, I m trying to install it by the method that you mentioned here, but I encountered some error, could you show the current environment that you're using? Such as jetpack, pytorch, vllm and etc. I'm afraid it's a version inconsistency.

@Irrerwirrer
Copy link

@walker-ai could you provide further details of the error? My project is already completed, and I do not have access to the environment anymore, but maybe your error was one of the ones I have also encountered along the way

@walker-ai
Copy link

walker-ai commented Oct 25, 2024

@walker-ai could you provide further details of the error? My project is already completed, and I do not have access to the environment anymore, but maybe your error was one of the ones I have also encountered along the way

When I type:

# cd /workspace/vllm/
python setup.py develop

it shows:

Error: could not find CMAKE_PROJECT_NAME in Cache
Traceback (most recent call last):
  File "setup.py", line 486, in <module>
    setup(
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/__init__.py", line 117, in setup
    return distutils.core.setup(**attrs)
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 183, in setup
    return run_commands(dist)
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 199, in run_commands
    dist.run_commands()
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 954, in run_commands
    self.run_command(cmd)
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/dist.py", line 950, in run_command
    super().run_command(command)
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
    cmd_obj.run()
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/command/develop.py", line 35, in run
    self.install_for_development()
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/command/develop.py", line 112, in install_for_development
    self.run_command('build_ext')
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/dist.py", line 950, in run_command
    super().run_command(command)
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
    cmd_obj.run()
  File "setup.py", line 243, in run
    super().run()
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 98, in run
    _build_ext.run(self)
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
    self.build_extensions()
  File "setup.py", line 217, in build_extensions
    subprocess.check_call(["cmake", *build_args], cwd=self.build_temp)
  File "/home/orin/tools/anaconda3/envs/tmp/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=8', '--target=_moe_C', '--target=vllm_flash_attn_c', '--target=_C']' returned non-zero exit status 1.

then I try to figure out the detail about cmake error log:

cmake --build . -j8 --target=_moe_C --verbose

it shows (partly):

[1/5] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4b8.cu.o
FAILED: CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4b8.cu.o 
ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_moe_C -D_moe_C_EXPORTS -I/home/orin/tools/vllm/csrc -isystem /home/orin/tools/anaconda3/envs/tmp/include/python3.8 -isystem /home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/torch/include -isystem /home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 -Xcompiler=-fPIC -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -DENABLE_SCALED_MM_C2X=1 -D_GLIBCXX_USE_CXX11_ABI=1 -gencode arch=compute_86,code=sm_86 -MD -MT CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4b8.cu.o -MF CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4b8.cu.o.d -x cu -c /home/orin/tools/vllm/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4b8.cu -o CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4b8.cu.o
/home/orin/tools/vllm/csrc/core/scalar_type.hpp(215): error: qualified name is not allowed

/home/orin/tools/vllm/csrc/core/scalar_type.hpp(215): error: explicit type is missing ("int" assumed)

/home/orin/tools/vllm/csrc/core/scalar_type.hpp(215): error: expected a ";"

...


[2/5] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4.cu.o
FAILED: CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4.cu.o 
ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_moe_C -D_moe_C_EXPORTS -I/home/orin/tools/vllm/csrc -isystem /home/orin/tools/anaconda3/envs/tmp/include/python3.8 -isystem /home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/torch/include -isystem /home/orin/tools/anaconda3/envs/tmp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 -Xcompiler=-fPIC -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -DENABLE_SCALED_MM_C2X=1 -D_GLIBCXX_USE_CXX11_ABI=1 -gencode arch=compute_86,code=sm_86 -MD -MT CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4.cu.o -MF CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4.cu.o.d -x cu -c /home/orin/tools/vllm/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4.cu -o CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4.cu.o
/home/orin/tools/vllm/csrc/core/scalar_type.hpp(215): error: qualified name is not allowed

/home/orin/tools/vllm/csrc/core/scalar_type.hpp(215): error: explicit type is missing ("int" assumed)

/home/orin/tools/vllm/csrc/core/scalar_type.hpp(215): error: expected a ";"

@Irrerwirrer
Copy link

Ok, I did not encounter this error.
If I remember correctly, then this was more or less my way:

  • Take care of problematic dependencies manually (remove torch and torchvision dependency from requirements-build and requirements-cuda (because I already installed custom-built pytorch and torchvision))
  • pip install -e .
    With the benefit of hindsight, my problem was just the badly built dependencies. When i got that fixed one after another (torch, torchvision, xformers, etc.) vllm wasn't a problem anymore.

But your error seems more like a cmake problem. I guess you already checked the Install Guide, which got updated a lot since I did my installation.

Hope you can fix it 👍

@walker-ai
Copy link

Ok, I did not encounter this error. If I remember correctly, then this was more or less my way:

  • Take care of problematic dependencies manually (remove torch and torchvision dependency from requirements-build and requirements-cuda (because I already installed custom-built pytorch and torchvision))
  • pip install -e .
    With the benefit of hindsight, my problem was just the badly built dependencies. When i got that fixed one after another (torch, torchvision, xformers, etc.) vllm wasn't a problem anymore.

But your error seems more like a cmake problem. I guess you already checked the Install Guide, which got updated a lot since I did my installation.

Hope you can fix it 👍

Thank you for providing these information, I will try that :)

@conroy-cheers
Copy link
Contributor

conroy-cheers commented Oct 26, 2024

I'm currently in the process of updating the nixpkgs build definition for vLLM from 0.5.3.post1 to 0.6.3.post1.

Build works great on x86_64 machine with NVIDIA GPU, and have successfully built for aarch64 Jetson Orin AGX with correct dependency / CUDA library versions. Unfortunately, at runtime on the Jetson, vLLM fails to start engine process:

  File "/nix/store/x5bz4k7prp7vdvm4i01y0z0lbi4awad7-python3.12-vllm-0.6.3.post1/lib/python3.12/site-packages/vllm/utils.py", line 799, in current_memory_usage
    return mem
           ^^^
UnboundLocalError: cannot access local variable 'mem' where it is not associated with a value

Have traced this back to the root cause - newer versions of vLLM use NVML to detect CUDA support.
NVML is not supported on Jetson, so unfortunately it isn't possible to proceed without making changes to vLLM here.

I'd be happy to open a PR to add a fallback mode on CUDA platforms where NVML isn't supported, if the maintainers would be open to it? Otherwise I will have to maintain a patch downstream.

@conroy-cheers
Copy link
Contributor

#9735

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation Installation problems
Projects
None yet
Development

No branches or pull requests

8 participants