You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/amax/yt26/VCM/LLaMA2-Accessory/accessory/main_finetune.py:41: UserWarning: cannot import FusedAdam from apex, use torch AdamW instead
warnings.warn("cannot import FusedAdam from apex, use torch AdamW instead")
/amax/yt26/VCM/LLaMA2-Accessory/accessory/main_finetune.py:41: UserWarning: cannot import FusedAdam from apex, use torch AdamW instead
warnings.warn("cannot import FusedAdam from apex, use torch AdamW instead")
| distributed init (rank 0): env://, gpu 0
| distributed init (rank 1): env://, gpu 1
and the program stuck on here. when the program debug and it stuck on the misc.py line 145 torch.distributed.barrier(). How can i deal with that?
The text was updated successfully, but these errors were encountered:
the message is:
cd /amax/yt26/VCM/LLaMA2-Accessory ; /amax/yt26/.conda/envs/accessory/bin/python /amax/yt26/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 58291 -- /amax/yt26/.conda/envs/accessory/bin/torchrun --master_port 1112 --nproc_per_node 2 /amax/yt26/VCM/LLaMA2-Accessory/accessory/main_finetune.py --output_dir output_dir/finetune/mm/alpacaLlava_llamaQformerv2_7B --epochs 3 --warmup_epochs 0.2 --batch_size 4 --accum_iter 2 --num_workers 16 --max_words 512 --lr 0.00003 --min_lr 0.000005 --clip_grad 2 --weight_decay 0.02 --data_parallel fsdp --model_parallel_size 2 --checkpointing --llama_type llama_qformerv2_peft --llama_config checkpoint/mm/alpacaLlava_llamaQformerv2/7B_params.json accessory/configs/model/finetune/sg/llamaPeft_normBiasLora.json --tokenizer_path checkpoint/mm/alpacaLlava_llamaQformerv2/tokenizer.model --pretrained_path checkpoint/mm/alpacaLlava_llamaQformerv2 --pretrained_type consolidated --data_config accessory/configs/data/finetune/mm/alpaca_llava_copy.yaml
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/amax/yt26/VCM/LLaMA2-Accessory/accessory/main_finetune.py:41: UserWarning: cannot import FusedAdam from apex, use torch AdamW instead
warnings.warn("cannot import FusedAdam from apex, use torch AdamW instead")
/amax/yt26/VCM/LLaMA2-Accessory/accessory/main_finetune.py:41: UserWarning: cannot import FusedAdam from apex, use torch AdamW instead
warnings.warn("cannot import FusedAdam from apex, use torch AdamW instead")
| distributed init (rank 0): env://, gpu 0
| distributed init (rank 1): env://, gpu 1
and the program stuck on here. when the program debug and it stuck on the misc.py line 145 torch.distributed.barrier(). How can i deal with that?
The text was updated successfully, but these errors were encountered: