Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find libcublas or libcuda #34

Closed
ewof opened this issue Aug 26, 2022 · 16 comments · May be fixed by #36
Closed

Cannot find libcublas or libcuda #34

ewof opened this issue Aug 26, 2022 · 16 comments · May be fixed by #36

Comments

@ewof
Copy link

ewof commented Aug 26, 2022

The machine is 5.18.12-artix1-1 with RTX 3060 with nvidia drivers, cuda package installed, python3.9

output for ldconfig -v | grep libcuda

	libcublasLt.so.11 -> libcublasLt.so.11.10.3.66
	libcublas.so.11 -> libcublas.so.11.10.3.66

output for ldconfig -v | grep libcublas

	libcuda.so.1 -> libcuda.so.515.57
	libcuda.so.1 -> libcuda.so.515.57

output for normal sudo docker-compose up

[+] Running 2/0
 ⠿ Container sukima-database-1  Created                                                                                                       0.0s
 ⠿ Container sukima-app-1       Created                                                                                                       0.0s
Attaching to sukima-app-1, sukima-database-1
sukima-database-1  |
sukima-database-1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
sukima-database-1  |
sukima-database-1  | 2022-08-26 16:54:22.044 UTC [1] LOG:  starting PostgreSQL 14.5 (Debian 14.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
sukima-database-1  | 2022-08-26 16:54:22.044 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
sukima-database-1  | 2022-08-26 16:54:22.044 UTC [1] LOG:  listening on IPv6 address "::", port 5432
sukima-database-1  | 2022-08-26 16:54:22.045 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
sukima-database-1  | 2022-08-26 16:54:22.047 UTC [26] LOG:  database system was shut down at 2022-08-26 16:54:19 UTC
sukima-database-1  | 2022-08-26 16:54:22.049 UTC [1] LOG:  database system is ready to accept connections
sukima-app-1       | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
sukima-app-1       | INFO  [alembic.runtime.migration] Will assume transactional DDL.
sukima-app-1       | INFO:     Will watch for changes in these directories: ['/sukima']
sukima-app-1       | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
sukima-app-1       | INFO:     Started reloader process [1] using watchgod
sukima-app-1       | Process SpawnProcess-1:
sukima-app-1       | Traceback (most recent call last):
sukima-app-1       |   File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
sukima-app-1       |     self.run()
sukima-app-1       |   File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
sukima-app-1       |     self._target(*self._args, **self._kwargs)
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/subprocess.py", line 76, in subprocess_started
sukima-app-1       |     target(sockets=sockets)
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 60, in run
sukima-app-1       |     return asyncio.run(self.serve(sockets=sockets))
sukima-app-1       |   File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
sukima-app-1       |     return loop.run_until_complete(main)
sukima-app-1       |   File "uvloop/loop.pyx", line 1501, in uvloop.loop.Loop.run_until_complete
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 67, in serve
sukima-app-1       |     config.load()
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/config.py", line 458, in load
sukima-app-1       |     self.loaded_app = import_from_string(self.app)
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/importer.py", line 21, in import_from_string
sukima-app-1       |     module = importlib.import_module(module_str)
sukima-app-1       |   File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
sukima-app-1       |     return _bootstrap._gcd_import(name[level:], package, level)
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
sukima-app-1       |   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
sukima-app-1       |   File "/sukima/./app/main.py", line 5, in <module>
sukima-app-1       |     from app.api.v1.api import api_router
sukima-app-1       |   File "/sukima/./app/api/v1/api.py", line 1, in <module>
sukima-app-1       |     from app.api.v1.endpoints import models, soft_prompts, users
sukima-app-1       |   File "/sukima/./app/api/v1/endpoints/models.py", line 7, in <module>
sukima-app-1       |     from app.gpt.berthf import BERTHF
sukima-app-1       |   File "/sukima/./app/gpt/berthf.py", line 9, in <module>
sukima-app-1       |     from app.gpt.tensorize import tensorize, untensorize
sukima-app-1       |   File "/sukima/./app/gpt/tensorize.py", line 2, in <module>
sukima-app-1       |     from app.gpt.quantization import FrozenBNBEmbedding, FrozenBNBLinear
sukima-app-1       |   File "/sukima/./app/gpt/quantization.py", line 7, in <module>
sukima-app-1       |     from bitsandbytes.functional import quantize_blockwise, dequantize_blockwise
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/__init__.py", line 5, in <module>
sukima-app-1       |     from .optim import adam
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/optim/__init__.py", line 5, in <module>
sukima-app-1       |     from .adam import Adam, Adam8bit, Adam32bit
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/optim/adam.py", line 11, in <module>
sukima-app-1       |     from bitsandbytes.optim.optimizer import Optimizer2State
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/optim/optimizer.py", line 6, in <module>
sukima-app-1       |     import bitsandbytes.functional as F
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/functional.py", line 13, in <module>
sukima-app-1       |     lib = ct.cdll.LoadLibrary(os.path.dirname(__file__) + '/libbitsandbytes.so')
sukima-app-1       |   File "/usr/local/lib/python3.9/ctypes/__init__.py", line 452, in LoadLibrary
sukima-app-1       |     return self._dlltype(name)
sukima-app-1       |   File "/usr/local/lib/python3.9/ctypes/__init__.py", line 374, in __init__
sukima-app-1       |     self._handle = _dlopen(self._name, mode)
sukima-app-1       | OSError: libcuda.so.1: cannot open shared object file: No such file or directory

output for sudo docker-compose -f docker-compose_nvidia-gpu.yaml up

[+] Running 2/0
 ⠿ Container sukima-database-1  Recreated                                                                                                     0.0s
 ⠿ Container sukima-app-1       Recreated                                                                                                     0.0s
Attaching to sukima-app-1, sukima-database-1
sukima-database-1  |
sukima-database-1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
sukima-database-1  |
sukima-database-1  | 2022-08-26 16:55:23.865 UTC [1] LOG:  starting PostgreSQL 14.5 (Debian 14.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
sukima-database-1  | 2022-08-26 16:55:23.866 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
sukima-database-1  | 2022-08-26 16:55:23.866 UTC [1] LOG:  listening on IPv6 address "::", port 5432
sukima-database-1  | 2022-08-26 16:55:23.866 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
sukima-database-1  | 2022-08-26 16:55:23.868 UTC [27] LOG:  database system was shut down at 2022-08-26 16:55:20 UTC
sukima-database-1  | 2022-08-26 16:55:23.871 UTC [1] LOG:  database system is ready to accept connections
sukima-app-1       | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
sukima-app-1       | INFO  [alembic.runtime.migration] Will assume transactional DDL.
sukima-app-1       | INFO:     Will watch for changes in these directories: ['/sukima']
sukima-app-1       | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
sukima-app-1       | INFO:     Started reloader process [1] using watchgod
sukima-app-1       | Process SpawnProcess-1:
sukima-app-1       | Traceback (most recent call last):
sukima-app-1       |   File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
sukima-app-1       |     self.run()
sukima-app-1       |   File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
sukima-app-1       |     self._target(*self._args, **self._kwargs)
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/subprocess.py", line 76, in subprocess_started
sukima-app-1       |     target(sockets=sockets)
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 60, in run
sukima-app-1       |     return asyncio.run(self.serve(sockets=sockets))
sukima-app-1       |   File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
sukima-app-1       |     return loop.run_until_complete(main)
sukima-app-1       |   File "uvloop/loop.pyx", line 1501, in uvloop.loop.Loop.run_until_complete
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 67, in serve
sukima-app-1       |     config.load()
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/config.py", line 458, in load
sukima-app-1       |     self.loaded_app = import_from_string(self.app)
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/uvicorn/importer.py", line 21, in import_from_string
sukima-app-1       |     module = importlib.import_module(module_str)
sukima-app-1       |   File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
sukima-app-1       |     return _bootstrap._gcd_import(name[level:], package, level)
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
sukima-app-1       |   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
sukima-app-1       |   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
sukima-app-1       |   File "/sukima/./app/main.py", line 5, in <module>
sukima-app-1       |     from app.api.v1.api import api_router
sukima-app-1       |   File "/sukima/./app/api/v1/api.py", line 1, in <module>
sukima-app-1       |     from app.api.v1.endpoints import models, soft_prompts, users
sukima-app-1       |   File "/sukima/./app/api/v1/endpoints/models.py", line 7, in <module>
sukima-app-1       |     from app.gpt.berthf import BERTHF
sukima-app-1       |   File "/sukima/./app/gpt/berthf.py", line 9, in <module>
sukima-app-1       |     from app.gpt.tensorize import tensorize, untensorize
sukima-app-1       |   File "/sukima/./app/gpt/tensorize.py", line 2, in <module>
sukima-app-1       |     from app.gpt.quantization import FrozenBNBEmbedding, FrozenBNBLinear
sukima-app-1       |   File "/sukima/./app/gpt/quantization.py", line 7, in <module>
sukima-app-1       |     from bitsandbytes.functional import quantize_blockwise, dequantize_blockwise
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/__init__.py", line 5, in <module>
sukima-app-1       |     from .optim import adam
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/optim/__init__.py", line 5, in <module>
sukima-app-1       |     from .adam import Adam, Adam8bit, Adam32bit
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/optim/adam.py", line 11, in <module>
sukima-app-1       |     from bitsandbytes.optim.optimizer import Optimizer2State
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/optim/optimizer.py", line 6, in <module>
sukima-app-1       |     import bitsandbytes.functional as F
sukima-app-1       |   File "/usr/local/lib/python3.9/site-packages/bitsandbytes/functional.py", line 13, in <module>
sukima-app-1       |     lib = ct.cdll.LoadLibrary(os.path.dirname(__file__) + '/libbitsandbytes.so')
sukima-app-1       |   File "/usr/local/lib/python3.9/ctypes/__init__.py", line 452, in LoadLibrary
sukima-app-1       |     return self._dlltype(name)
sukima-app-1       |   File "/usr/local/lib/python3.9/ctypes/__init__.py", line 374, in __init__
sukima-app-1       |     self._handle = _dlopen(self._name, mode)
sukima-app-1       | OSError: libcublas.so.11: cannot open shared object file: No such file or directory
@yoinked-h
Copy link

cp /usr/local/cuda-11.0/lib64/libcublas.so.11 /usr/lib/x86_64-linux-gnu/libcublas.so.11 shoudl move it into the right place
cublas installs in the wrong place so we have to move it to the driver place

@ewof
Copy link
Author

ewof commented Sep 5, 2022

even when i copy them from in my case they were installed at /opt/cuda/lib64/libcublas.so.11 to usr/lib/x86_64-linux-gnu/ sukima still gives that error

@yoinked-h
Copy link

huh, what i tried is to install cublas package on its own and cuda, so i assume that bits and bytes is not reading it correctly

@ewof
Copy link
Author

ewof commented Sep 13, 2022

i can import bitsandbytes in a seperate file without error

@gakada
Copy link

gakada commented Nov 30, 2022

Same problem on vanilla Arch, though it is easy to just run Postgres in Docker and the app natively in a Conda environment (by installing what is in Dockerfile, sourceing conf.env, and running command from docker-compose.yaml).

@jsamson23
Copy link

I'm also having the exact same issue with the same GPU, cuda drivers also installed, and python3.9 as well.

@yoinked-h
Copy link

if i had to guess, its either a python issue or a package issue, my best bet would be to figure out where its trying to import it from and put it there

@yoinked-h
Copy link

digging into it, the issue is with bitsandbytes .so file; even the python says its not maintained, maybe updating the package since 113 isnt supported anymore

@yoinked-h
Copy link

although for details of this specific error:

  • its an issue with bitsandbytes-cuda113 (specifically the functional part)
  • biggest guess is the included .so file, its the main error with importing some of the .so files included in it

@ewof
Copy link
Author

ewof commented Dec 3, 2022

image
I can copy paste the exact line it imports from or even run the file itself and it works fine with bitsandbytes, but whenever i run the docker container it doesn't work

@ewof
Copy link
Author

ewof commented Dec 3, 2022

ok so i edited docker-compose_nvidia-gpu.yaml to have runtime: nvidia after line 10 and - /opt/cuda:/usr/local/cuda after the line that has - ./:/sukima/ i followed https://medium.com/@adityathiruvengadam/cuda-docker-%EF%B8%8F-for-deep-learning-cab7c2be67f9 (also edited the Dockerfile according to the website)

on artix i have both nvidia-docker and nvidia-docker-compose

@dinchu
Copy link

dinchu commented Jan 8, 2023

both fail on windows nvidia and non nvidia version

@ewof
Copy link
Author

ewof commented Jan 8, 2023

found fix to it like a month ago was gonna make a pr but never got around to it, i'll do it soon

@Kadah
Copy link

Kadah commented Jan 30, 2023

Was running in to this as well. Tried the workarounds here to no effect, same with some other fixes tested though don't remember exactly what as that was a month ago. If there is a fix, that'd be great.

@ewof
Copy link
Author

ewof commented Jan 31, 2023

i made pr with fix that worked for me in #36

@dinchu
Copy link

dinchu commented Jan 31, 2023

thanks!

@ewof ewof closed this as completed Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants