Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: device-side assert triggered #5

Closed
Sang-Yeop-Yeo opened this issue Dec 18, 2024 · 5 comments
Closed

CUDA error: device-side assert triggered #5

Sang-Yeop-Yeo opened this issue Dec 18, 2024 · 5 comments

Comments

@Sang-Yeop-Yeo
Copy link

Thank you for your excellent work.

I wanted to try using the shared model, but unfortunately, I encountered an error.

How can I resolve the error below?

../aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
Extracting title:   0%|                                                                                                                 | 0/1 [00:01<?, ?it/s]
2024-12-18 22:13:43.050 Uncaught app execution
Traceback (most recent call last):
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec
    exec(code, module.__dict__)
  File "/data2/sosick377/annotateai/app/app.py", line 147, in <module>
    app.run()
  File "/data2/sosick377/annotateai/app/app.py", line 69, in run
    output = self.build(url)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 174, in __call__
    return self._cached_func(self._instance, *args, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 218, in __call__
    return self._get_or_create_cached_value(args, kwargs, spinner_message)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 260, in _get_or_create_cached_value
    return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 318, in _handle_cache_miss
    computed_value = self._info.func(*func_args, **func_kwargs)
  File "/data2/sosick377/annotateai/app/app.py", line 109, in build
    return _self.annotate(url)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/annotateai/annotate.py", line 67, in __call__
    title = self.title(pages[0], progress)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/annotateai/annotate.py", line 144, in title
    result = self.llm([{"role": "user", "content": x}], maxlength=2048)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/llm.py", line 63, in __call__
    return self.generator(text, maxlength, stream, stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/generation.py", line 54, in __call__
    results = self.execute(texts, maxlength, stream, stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/generation.py", line 86, in execute
    return list(self.stream(texts, maxlength, stream, stop, **kwargs))
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/huggingface.py", line 29, in stream
    yield from self.llm(texts, maxlength=maxlength, stream=stream, stop=stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/huggingface.py", line 79, in __call__
    results = self.pipeline(texts, stop_strings=stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 270, in __call__
    return super().__call__(chats, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1282, in __call__
    outputs = list(final_iterator)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1208, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 370, in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/generation/utils.py", line 2252, in generate
    result = self._sample(
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/generation/utils.py", line 3298, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-12-18 22:13:43.367 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_
Extracting page text: 19it [00:01, 15.62it/s]
Extracting title:   0%|                                                                                                                 | 0/1 [00:00<?, ?it/s]
2024-12-18 22:14:08.539 Uncaught app execution
Traceback (most recent call last):
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec
    exec(code, module.__dict__)
  File "/data2/sosick377/annotateai/app/app.py", line 147, in <module>
    app.run()
  File "/data2/sosick377/annotateai/app/app.py", line 69, in run
    output = self.build(url)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 174, in __call__
    return self._cached_func(self._instance, *args, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 218, in __call__
    return self._get_or_create_cached_value(args, kwargs, spinner_message)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 260, in _get_or_create_cached_value
    return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 318, in _handle_cache_miss
    computed_value = self._info.func(*func_args, **func_kwargs)
  File "/data2/sosick377/annotateai/app/app.py", line 109, in build
    return _self.annotate(url)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/annotateai/annotate.py", line 67, in __call__
    title = self.title(pages[0], progress)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/annotateai/annotate.py", line 144, in title
    result = self.llm([{"role": "user", "content": x}], maxlength=2048)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/llm.py", line 63, in __call__
    return self.generator(text, maxlength, stream, stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/generation.py", line 54, in __call__
    results = self.execute(texts, maxlength, stream, stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/generation.py", line 86, in execute
    return list(self.stream(texts, maxlength, stream, stop, **kwargs))
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/huggingface.py", line 29, in stream
    yield from self.llm(texts, maxlength=maxlength, stream=stream, stop=stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/huggingface.py", line 79, in __call__
    results = self.pipeline(texts, stop_strings=stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 270, in __call__
    return super().__call__(chats, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1282, in __call__
    outputs = list(final_iterator)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1207, in forward
    model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1114, in _ensure_tensor_on_device
    return UserDict({name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()})
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1114, in <dictcomp>
    return UserDict({name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()})
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1120, in _ensure_tensor_on_device
    return inputs.to(device)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Extracting page text: 19it [00:01, 15.65it/s]
Extracting title:   0%|                                                                                                                 | 0/1 [00:00<?, ?it/s]
2024-12-18 22:14:25.431 Uncaught app execution
Traceback (most recent call last):
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec
    exec(code, module.__dict__)
  File "/data2/sosick377/annotateai/app/app.py", line 147, in <module>
    app.run()
  File "/data2/sosick377/annotateai/app/app.py", line 69, in run
    output = self.build(url)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 174, in __call__
    return self._cached_func(self._instance, *args, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 218, in __call__
    return self._get_or_create_cached_value(args, kwargs, spinner_message)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 260, in _get_or_create_cached_value
    return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 318, in _handle_cache_miss
    computed_value = self._info.func(*func_args, **func_kwargs)
  File "/data2/sosick377/annotateai/app/app.py", line 109, in build
    return _self.annotate(url)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/annotateai/annotate.py", line 67, in __call__
    title = self.title(pages[0], progress)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/annotateai/annotate.py", line 144, in title
    result = self.llm([{"role": "user", "content": x}], maxlength=2048)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/llm.py", line 63, in __call__
    return self.generator(text, maxlength, stream, stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/generation.py", line 54, in __call__
    results = self.execute(texts, maxlength, stream, stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/generation.py", line 86, in execute
    return list(self.stream(texts, maxlength, stream, stop, **kwargs))
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/huggingface.py", line 29, in stream
    yield from self.llm(texts, maxlength=maxlength, stream=stream, stop=stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/txtai/pipeline/llm/huggingface.py", line 79, in __call__
    results = self.pipeline(texts, stop_strings=stop, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 270, in __call__
    return super().__call__(chats, **kwargs)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1282, in __call__
    outputs = list(final_iterator)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1207, in forward
    model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1114, in _ensure_tensor_on_device
    return UserDict({name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()})
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1114, in <dictcomp>
    return UserDict({name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()})
  File "/opt/anaconda/sosick377/anaconda3/envs/annotateai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1120, in _ensure_tensor_on_device
    return inputs.to(device)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@davidmezzetti
Copy link
Member

Hello. Can you share more about your setup? Do you have a GPU available? Is this Linux?

@Sang-Yeop-Yeo
Copy link
Author

Sang-Yeop-Yeo commented Dec 19, 2024

Thank you for your response. I checked that my GPU is working properly.

I share my environment.

OS: linux
GPU: A100
Ubuntu ver: 22.04.2
CUDA version: 11.8
NVIDIA driver version: 520.61.05

Virtual environment:

accelerate                1.2.1
aiohappyeyeballs          2.4.4
aiohttp                   3.11.10
aiosignal                 1.3.2
altair                    5.5.0
annotateai                0.2.0
asttokens                 3.0.0
async-timeout             5.0.1
attrs                     24.3.0
autoawq                   0.2.7.post3
autoawq_kernels           0.0.9
blessed                   1.20.0
blinker                   1.9.0
cachetools                5.5.0
certifi                   2024.12.14
cffi                      1.17.1
charset-normalizer        3.4.0
click                     8.1.7
comm                      0.2.2
cryptography              44.0.0
datasets                  3.2.0
debugpy                   1.6.7
decorator                 5.1.1
dill                      0.3.8
einops                    0.8.0
entrypoints               0.4
exceptiongroup            1.2.2
executing                 2.1.0
faiss-cpu                 1.9.0.post1
filelock                  3.16.1
flash-attn                2.7.2.post1
fonttools                 4.55.3
frozenlist                1.5.0
fsspec                    2024.9.0
gitdb                     4.0.11
GitPython                 3.1.43
gpustat                   1.1.1
huggingface-hub           0.27.0
idna                      3.10
ipykernel                 6.29.5
ipython                   8.30.0
jedi                      0.19.2
Jinja2                    3.1.4
joblib                    1.4.2
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
jupyter-client            7.3.4
jupyter_core              5.7.2
markdown-it-py            3.0.0
MarkupSafe                3.0.2
matplotlib-inline         0.1.7
mdurl                     0.1.2
mpmath                    1.3.0
msgpack                   1.1.0
multidict                 6.1.0
multiprocess              0.70.16
narwhals                  1.18.4
nest_asyncio              1.6.0
networkx                  3.4.2
nltk                      3.9.1
numpy                     2.2.0
nvidia-cublas-cu11        11.11.3.6
nvidia-cublas-cu12        12.4.5.8
nvidia-cuda-cupti-cu11    11.8.87
nvidia-cuda-cupti-cu12    12.4.127
nvidia-cuda-nvrtc-cu11    11.8.89
nvidia-cuda-nvrtc-cu12    12.4.127
nvidia-cuda-runtime-cu11  11.8.89
nvidia-cuda-runtime-cu12  12.4.127
nvidia-cudnn-cu11         9.1.0.70
nvidia-cudnn-cu12         9.1.0.70
nvidia-cufft-cu11         10.9.0.58
nvidia-cufft-cu12         11.2.1.3
nvidia-curand-cu11        10.3.0.86
nvidia-curand-cu12        10.3.5.147
nvidia-cusolver-cu11      11.4.1.48
nvidia-cusolver-cu12      11.6.1.9
nvidia-cusparse-cu11      11.7.5.86
nvidia-cusparse-cu12      12.3.1.170
nvidia-ml-py              12.560.30
nvidia-nccl-cu11          2.21.5
nvidia-nccl-cu12          2.21.5
nvidia-nvjitlink-cu12     12.4.127
nvidia-nvtx-cu11          11.8.86
nvidia-nvtx-cu12          12.4.127
packaging                 24.2
pandas                    2.2.3
parso                     0.8.4
pdf-annotate              0.12.0
pdfminer.six              20240706
pdfrw                     0.4
pexpect                   4.9.0
pickleshare               0.7.5
pillow                    11.0.0
pip                       24.2
platformdirs              4.3.6
prompt_toolkit            3.0.48
propcache                 0.2.1
protobuf                  5.29.1
psutil                    5.9.0
ptyprocess                0.7.0
pure_eval                 0.2.3
pyarrow                   18.1.0
pycparser                 2.22
pydeck                    0.9.1
Pygments                  2.18.0
python-dateutil           2.9.0.post0
pytz                      2024.2
PyYAML                    6.0.2
pyzmq                     25.1.2
referencing               0.35.1
regex                     2024.11.6
requests                  2.32.3
rich                      13.9.4
rpds-py                   0.22.3
safetensors               0.4.5
setuptools                75.1.0
six                       1.17.0
smmap                     5.0.1
stack_data                0.6.3
streamlit                 1.41.1
streamlit-pdf-viewer      0.0.19
sympy                     1.13.1
tenacity                  9.0.0
tokenizers                0.21.0
toml                      0.10.2
torch                     2.5.1+cu118
torchaudio                2.5.1+cu118
torchvision               0.20.1+cu118
tornado                   6.1
tqdm                      4.67.1
traitlets                 5.14.3
transformers              4.47.1
triton                    3.1.0
txtai                     8.1.0
txtmarker                 1.1.0
typing_extensions         4.12.2
tzdata                    2024.2
urllib3                   2.2.3
watchdog                  6.0.0
wcwidth                   0.2.13
wheel                     0.44.0
xxhash                    3.5.0
yarl                      1.18.3
zstandard                 0.23.0

@Sang-Yeop-Yeo
Copy link
Author

Hello. Can you share more about your setup? Do you have a GPU available? Is this Linux?

I have shared my environment. Please let me know if you need any additional information.

@davidmezzetti
Copy link
Member

I see both cu11 and cu12 in that virtual environment. Perhaps create a fresh virtualenv?

This error leads me to believe there is an issue between torch 2.5.1 and the underlying CUDA library it's interfacing with.

Since you have an A100, you could also try the unquantized model to rule out an issue with AWQ: https://huggingface.co/OpenScholar/Llama-3.1_OpenScholar-8B

@davidmezzetti
Copy link
Member

Closing this issue due to inactivity. Please re-open or open a new issue to continue the conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants