Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: invalid device ordinal. When I run script.py, I meet the error below. #27

Open
Tingberer opened this issue Jul 27, 2024 · 2 comments

Comments

@Tingberer
Copy link

Fetching 267 files: 100%
 267/267 [00:00<00:00,  7.56it/s]
[WARNING] FlashAttention is not available in the current environment. Using default attention.
Time to load prefetch op: 3.0545926094055176 seconds
Creating model from scratch ...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
No modifications detected for re-loaded extension module prefetch, skipping build step...
Loading extension module prefetch...
Loading checkpoint files: 0%| | 0/257 [00:00<?, ?it/s]

RuntimeError Traceback (most recent call last)
in <cell line: 17>()
15 }
16
---> 17 model = MoE(checkpoint, config)
18
19 input_text = "translate English to German: How old are you?"

1 frames
/usr/local/lib/python3.10/dist-packages/moe_infinity/runtime/model_offload.py in archer_from_pretrained(cls, *args, **kwargs)
405 # convert all tensors in state_dict to self.dtype
406 for k, v in state_dict.items():
--> 407 state_dict[k] = v.to(self.dtype).to("cpu")
408
409 self._offload_state_dict(state_dict, empty_state_dict)

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@drunkcoding
Copy link
Contributor

Is the possible to provide the script to reproduce? If this is one of the example, please specify which one you have run. Providing hardware settings might also be helpful

@Tingberer
Copy link
Author

I have fix it. But when I run readme_example.py, I meet the problew below.
My hardware is 4*RTX3090

/home/admin/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/transformers/generation/utils.py:1249: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
/home/admin/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/transformers/generation/utils.py:1797: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cuda, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') before running `.generate()`.
  warnings.warn(
Model create:  20%|████████████████████▋                                                                                 | 930/4578 [00:16<00:01, 2992.65it/s]translate English to German: You are Germany?

# Translate English to German


ArcherTaskPool destructor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants