RuntimeError: CUDA error: invalid device ordinal. When I run script.py, I meet the error below. #27

Tingberer · 2024-07-27T14:07:34Z

Fetching 267 files: 100%
267/267 [00:00<00:00, 7.56it/s]
[WARNING] FlashAttention is not available in the current environment. Using default attention.
Time to load prefetch op: 3.0545926094055176 seconds
Creating model from scratch ...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
No modifications detected for re-loaded extension module prefetch, skipping build step...
Loading extension module prefetch...
Loading checkpoint files: 0%| | 0/257 [00:00<?, ?it/s]

RuntimeError Traceback (most recent call last)
in <cell line: 17>()
15 }
16
---> 17 model = MoE(checkpoint, config)
18
19 input_text = "translate English to German: How old are you?"

1 frames
/usr/local/lib/python3.10/dist-packages/moe_infinity/runtime/model_offload.py in archer_from_pretrained(cls, *args, **kwargs)
405 # convert all tensors in state_dict to self.dtype
406 for k, v in state_dict.items():
--> 407 state_dict[k] = v.to(self.dtype).to("cpu")
408
409 self._offload_state_dict(state_dict, empty_state_dict)

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

The text was updated successfully, but these errors were encountered:

drunkcoding · 2024-07-29T13:56:14Z

Is the possible to provide the script to reproduce? If this is one of the example, please specify which one you have run. Providing hardware settings might also be helpful

Tingberer · 2024-08-07T07:45:59Z

I have fix it. But when I run readme_example.py, I meet the problew below.
My hardware is 4*RTX3090

/home/admin/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/transformers/generation/utils.py:1249: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
/home/admin/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/transformers/generation/utils.py:1797: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cuda, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') before running `.generate()`.
  warnings.warn(
Model create:  20%|████████████████████▋                                                                                 | 930/4578 [00:16<00:01, 2992.65it/s]translate English to German: You are Germany?

# Translate English to German


ArcherTaskPool destructor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: invalid device ordinal. When I run script.py, I meet the error below. #27

RuntimeError: CUDA error: invalid device ordinal. When I run script.py, I meet the error below. #27

Tingberer commented Jul 27, 2024

drunkcoding commented Jul 29, 2024

Tingberer commented Aug 7, 2024

RuntimeError: CUDA error: invalid device ordinal. When I run script.py, I meet the error below. #27

RuntimeError: CUDA error: invalid device ordinal. When I run script.py, I meet the error below. #27

Comments

Tingberer commented Jul 27, 2024

drunkcoding commented Jul 29, 2024

Tingberer commented Aug 7, 2024