You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fetching 267 files: 100%
267/267 [00:00<00:00, 7.56it/s]
[WARNING] FlashAttention is not available in the current environment. Using default attention.
Time to load prefetch op: 3.0545926094055176 seconds
Creating model from scratch ...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
No modifications detected for re-loaded extension module prefetch, skipping build step...
Loading extension module prefetch...
Loading checkpoint files: 0%| | 0/257 [00:00<?, ?it/s]
RuntimeError Traceback (most recent call last) in <cell line: 17>()
15 }
16
---> 17 model = MoE(checkpoint, config)
18
19 input_text = "translate English to German: How old are you?"
1 frames /usr/local/lib/python3.10/dist-packages/moe_infinity/runtime/model_offload.py in archer_from_pretrained(cls, *args, **kwargs)
405 # convert all tensors in state_dict to self.dtype
406 for k, v in state_dict.items():
--> 407 state_dict[k] = v.to(self.dtype).to("cpu")
408
409 self._offload_state_dict(state_dict, empty_state_dict)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
The text was updated successfully, but these errors were encountered:
Is the possible to provide the script to reproduce? If this is one of the example, please specify which one you have run. Providing hardware settings might also be helpful
I have fix it. But when I run readme_example.py, I meet the problew below.
My hardware is 4*RTX3090
/home/admin/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/transformers/generation/utils.py:1249: UserWarning: Usingthemodel-agnosticdefault`max_length` (=20) tocontrolthegenerationlength. Werecommendsetting`max_new_tokens`tocontrolthemaximumlengthofthegeneration.
warnings.warn(
/home/admin/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/transformers/generation/utils.py:1797: UserWarning: Youarecalling .generate() withthe`input_ids`beingonadevicetypedifferentthanyourmodel's device. `input_ids` is on cuda, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') beforerunning`.generate()`.
warnings.warn(
Modelcreate: 20%|████████████████████▋ |930/4578 [00:16<00:01, 2992.65it/s]translateEnglishtoGerman: YouareGermany?
# Translate English to GermanArcherTaskPooldestructor
Fetching 267 files: 100%
267/267 [00:00<00:00, 7.56it/s]
[WARNING] FlashAttention is not available in the current environment. Using default attention.
Time to load prefetch op: 3.0545926094055176 seconds
Creating model from scratch ...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
No modifications detected for re-loaded extension module prefetch, skipping build step...
Loading extension module prefetch...
Loading checkpoint files: 0%| | 0/257 [00:00<?, ?it/s]
RuntimeError Traceback (most recent call last)
in <cell line: 17>()
15 }
16
---> 17 model = MoE(checkpoint, config)
18
19 input_text = "translate English to German: How old are you?"
1 frames
/usr/local/lib/python3.10/dist-packages/moe_infinity/runtime/model_offload.py in archer_from_pretrained(cls, *args, **kwargs)
405 # convert all tensors in state_dict to self.dtype
406 for k, v in state_dict.items():
--> 407 state_dict[k] = v.to(self.dtype).to("cpu")
408
409 self._offload_state_dict(state_dict, empty_state_dict)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.The text was updated successfully, but these errors were encountered: