Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local cuda container build fails with "unsupported instruction `vpdpbusd'" #471

Closed
nzwulfin opened this issue Nov 20, 2024 · 10 comments
Closed

Comments

@nzwulfin
Copy link
Contributor

Trying to build on my home system, ./container_build.sh cuda will fail with the following error

/tmp/ccnKypuJ.s: Assembler messages:
/tmp/ccnKypuJ.s:31871: Error: unsupported instruction `vpdpbusd'
/tmp/ccnKypuJ.s:31926: Error: unsupported instruction `vpdpbusd'
/tmp/ccnKypuJ.s:31995: Error: unsupported instruction `vpdpbusd'
/tmp/ccnKypuJ.s:32060: Error: unsupported instruction `vpdpbusd'
/tmp/ccnKypuJ.s:32113: Error: unsupported instruction `vpdpbusd'
gmake[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:132: ggml/src/CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:1591: ggml/src/CMakeFiles/ggml.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2

From what I can tell online, this is due to the binutils in RHEL 9 not being new enough to support the instruction.

I made some progress by adding the GCC Toolset 12 to the cuda portion of the dnf_install switch statement, but I'm not familiar enough with what needs to really get set to use the toolset correctly. I expect that scl enable is doing a lot more with the path than I exported.

    dnf install -y gcc-toolset-12 
    export CC=/opt/rh/gcc-toolset-12/root/usr/bin/gcc
    export CCXX=/opt/rh/gcc-toolset-12/root/usr/bin/g++

I've hit my limit for testing but thought I'd report the issue anyhow.

@nzwulfin
Copy link
Contributor Author

I examined a UBI 9 container with the toolset and the CUDA dev container and brute forced a few more exports for the build to complete. I don't think this is the right solution, but might serve as a pointer to one.

  elif [ "$containerfile" = "cuda" ]; then
    dnf install -y "${rpm_list[@]}"
    dnf install -y gcc-toolset-12 
    export CC=/opt/rh/gcc-toolset-12/root/usr/bin/gcc
    export CCXX=/opt/rh/gcc-toolset-12/root/usr/bin/g++
    export PKG_CONFIG_PATH=/opt/rh/gcc-toolset-12/root/usr/lib64/pkgconfig
    export INFOPATH=/opt/rh/gcc-toolset-12/root/usr/share/info
    export LD_LIBRARY_PATH=/opt/rh/gcc-toolset-12/root/usr/lib64:/opt/rh/gcc-toolset-12/root/usr/lib:$LD_LIBRARY_PATH
    export PATH=/usr/share/Modules/bin:/opt/rh/gcc-toolset-12/root/usr/bin:/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH

@bmahabirbu
Copy link
Collaborator

@nzwulfin good analysis. Did you also try doing scl enable gcc-toolset-12 bash before doing the exports? It will create a separate terminal with the GCC toolset 12 and it should avoid the error.

In general, I have personally tested the building process on Ubuntu 24.04, Ubuntu 22.04 in WSL2, and Fedora 40 but I'm new to Rhel 9!

@ericcurtin
Copy link
Collaborator

Let's open a PR and get this change in, related issue:

ggerganov/llama.cpp#5316

@nzwulfin
Copy link
Contributor Author

@bmahabirbu I did try the scl enable bash step both in the switch and after the dnf_install in the main body. I didn't see any changes to which GCC got picked up by cmake, but it also didn't throw any errors.

I didn't have any problems in a local version of the cuda:12.6.2-devel-ubi9 container:

[root@13fd7588eacd /]# scl enable gcc-toolset-12 bash

[root@30d20f630919 /]# gcc --version
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-7)
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It might be the bash invocation inside a running script.

I found the enable file and it's mainly just a bunch of env exports. I'm going to test changing all my exports to

source /opt/rh/gcc-toolset-12/enable

I'll report in once I have a local build

@nzwulfin
Copy link
Contributor Author

Let's open a PR and get this change in, related issue:

ggerganov/llama.cpp#5316

Well if I had read the issue Eric linked, I could have saved all my testing this morning ;)

Based on ggerganov/llama.cpp#5316 (comment) and ggerganov/llama.cpp#5316 (comment)

I should have the right combo in this attempt:

  elif [ "$containerfile" = "cuda" ]; then
    dnf install -y "${rpm_list[@]}"
    dnf install -y gcc-toolset-12 
    source /opt/rh/gcc-toolset-12/enable 

@nzwulfin
Copy link
Contributor Author

The llama.cpp compile was a little noisy b/c of an enabled warning but it worked and was able to get llama3.2 working via notes in the discussion. I'll clean up my local repo and submit a PR so folks can look at it in context.

Here's the warning I was seeing in case someone wants to think about silencing it.

/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/include/avx512fintrin.h:5946:10: warning: '__Y' may be used uninitialized [-Wmaybe-uninitialized]

@bmahabirbu
Copy link
Collaborator

@ericcurtin good find for that issue! I'm surprised I didn't come upon it during my search.

@nzwulfin my apologies but thank you for testing my suggestion anyway! Guess scl enable doesn't properly give access to gcc toolket 12. It's good to know that using sources works.

@nzwulfin
Copy link
Contributor Author

@bmahabirbu no worries, I wanted to make sure I didn't miss anything the first time I tried it!

@nzwulfin
Copy link
Contributor Author

PR #473 submitted, thanks y'all!

@nzwulfin
Copy link
Contributor Author

PR #473 was merged, local test confirmed the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants