We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llamafactory
Training script
### model model_name_or_path: /model/base/qwen/Qwen2.5-VL-7B-Instruct ### method stage: sft do_train: true finetuning_type: full freeze_vision_tower: true # choices: [true, false] train_mm_proj_only: false # choices: [true, false] deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json] flash_attn: fa2 ### dataset dataset: longwriter-v-10k template: qwen2_vl cutoff_len: 32768 # max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 8 ### output output_dir: /model/trained/qwen/qwen2.5_vl-7b= logging_steps: 1 save_steps: 100 plot_loss: true overwrite_output_dir: true ### train per_device_train_batch_size: 1 gradient_accumulation_steps: 2 learning_rate: 1.0e-5 num_train_epochs: 3 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 ### eval # val_size: 0.001 # per_device_eval_batch_size: 1 # eval_strategy: steps # eval_steps: 100
Error message:
[rank0]: Traceback (most recent call last): [rank0]: File "/app/src/llamafactory/launcher.py", line 23, in <module> [rank0]: launch() [rank0]: File "/app/src/llamafactory/launcher.py", line 19, in launch [rank0]: run_exp() [rank0]: File "/app/src/llamafactory/train/tuner.py", line 92, in run_exp [rank0]: _training_function(config={"args": args, "callbacks": callbacks}) [rank0]: File "/app/src/llamafactory/train/tuner.py", line 66, in _training_function [rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) [rank0]: File "/app/src/llamafactory/train/sft/workflow.py", line 101, in run_sft [rank0]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2184, in train [rank0]: return inner_training_loop( [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2490, in _inner_training_loop [rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3598, in training_step [rank0]: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3659, in compute_loss [rank0]: outputs = model(**inputs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1914, in forward [rank0]: loss = self.module(*inputs, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1844, in _call_impl [rank0]: return inner() [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1790, in inner [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1739, in forward [rank0]: image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1844, in _call_impl [rank0]: return inner() [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1790, in inner [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 496, in forward [rank0]: hidden_states = self._gradient_checkpointing_func( [rank0]: File "/app/src/llamafactory/model/model_utils/checkpointing.py", line 93, in custom_gradient_checkpointing_func [rank0]: return gradient_checkpointing_func(func, *args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 32, in inner [rank0]: return disable_fn(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 632, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 489, in checkpoint [rank0]: return CheckpointFunction.apply(function, preserve, *args) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 575, in apply [rank0]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 264, in forward [rank0]: outputs = run_function(*args) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1844, in _call_impl [rank0]: return inner() [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1790, in inner [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 296, in forward [rank0]: hidden_states = hidden_states + self.attn( [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1844, in _call_impl [rank0]: return inner() [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1790, in inner [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 185, in forward [rank0]: q = apply_rotary_pos_emb_flashatt(q.unsqueeze(0), rotary_pos_emb).squeeze(0) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 166, in apply_rotary_pos_emb_flashatt [rank0]: output = apply_rotary_emb(tensor_, cos, sin).type_as(tensor) [rank0]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/layers/rotary.py", line 122, in apply_rotary_emb [rank0]: return ApplyRotaryEmb.apply( [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 575, in apply [rank0]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank0]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/layers/rotary.py", line 48, in forward [rank0]: out = apply_rotary( [rank0]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/ops/triton/rotary.py", line 176, in apply_rotary [rank0]: x.dtype == cos.dtype [rank0]: AssertionError: Input and cos/sin must have the same dtype, got torch.float32 and torch.bfloat16
No response
The text was updated successfully, but these errors were encountered:
Related issue: QwenLM/Qwen2.5-VL#706
Sorry, something went wrong.
No branches or pull requests
Reminder
System Info
llamafactory
version: 0.9.2.dev0Reproduction
Training script
Error message:
Others
No response
The text was updated successfully, but these errors were encountered: