You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I attempted to fine-tune a 6 billion parameter model using 8 A100 GPUs, but the training process encountered interruptions. On the first attempt, it stopped at 0.15 epochs, and on the second attempt, where I started from 2 epochs, it oddly skipped some epochs, jumping from 0.15 directly to 1, and then stopped at 2.25. For more detailed information, you can check this WandB link - https://wandb.ai/neural-network-018/huggingface/runs/8xmy6gtd/
Configs -
Model arguments
model_name_or_path: 01-ai/Yi-6B
model_revision: main
torch_dtype: bfloat16
use_flash_attention_2: false
trust_remote_code: true
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 13352.0365, 'train_samples_per_second': 42.755, 'train_steps_per_second': 1.336, 'train_loss': 0.9719247023264567, 'epoch': 1.15}
15%|█████ | 2736/17840 [3:42:30<20:28:20, 4.88s/it]
***** train metrics *****
epoch = 1.15
train_loss = 0.9719
train_runtime = 3:42:32.03
train_samples = 285436
train_samples_per_second = 42.755
train_steps_per_second = 1.336
2023-11-14 05:52:34 - INFO - main - *** Evaluate ***
[INFO|trainer.py:3158] 2023-11-14 05:52:34,843 >> ***** Running Evaluation *****
[INFO|trainer.py:3160] 2023-11-14 05:52:34,843 >> Num examples = 500
[INFO|trainer.py:3163] 2023-11-14 05:52:34,844 >> Batch size = 1
14%|██████▎ | 9/63 [00:02<00:16, 3.23it/s]
***** eval metrics *****
epoch = 1.15
eval_loss = 1.0677
eval_runtime = 0:00:04.48
eval_samples = 500
eval_samples_per_second = 111.451
eval_steps_per_second = 14.043
2023-11-14 05:52:39 - INFO - main - *** Save model ***
[INFO|trainer.py:2881] 2023-11-14 05:52:43,590 >> Saving model checkpoint to data/apt-chat-yi-6B-sft-full
[INFO|configuration_utils.py:461] 2023-11-14 05:52:43,592 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json
[INFO|configuration_utils.py:564] 2023-11-14 05:52:43,592 >> Configuration saved in data/apt-chat-yi-6B-sft-full/generation_config.json
[INFO|modeling_utils.py:2201] 2023-11-14 05:52:51,334 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at data/apt-chat-yi-6B-sft-full/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2428] 2023-11-14 05:52:51,336 >> tokenizer config file saved in data/apt-chat-yi-6B-sft-full/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-11-14 05:52:51,337 >> Special tokens file saved in data/apt-chat-yi-6B-sft-full/special_tokens_map.json
[INFO|trainer.py:2881] 2023-11-14 05:52:55,599 >> Saving model checkpoint to data/apt-chat-yi-6B-sft-full
[INFO|configuration_utils.py:461] 2023-11-14 05:52:55,601 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json
[INFO|configuration_utils.py:564] 2023-11-14 05:52:55,601 >> Configuration saved in data/apt-chat-yi-6B-sft-full/generation_config.json
[INFO|modeling_utils.py:2201] 2023-11-14 05:53:06,302 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at data/apt-chat-yi-6B-sft-full/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2428] 2023-11-14 05:53:06,303 >> tokenizer config file saved in data/apt-chat-yi-6B-sft-full/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-11-14 05:53:06,304 >> Special tokens file saved in data/apt-chat-yi-6B-sft-full/special_tokens_map.json
2023-11-14 05:55:20 - INFO - main - Model saved to data/apt-chat-yi-6B-sft-full
[INFO|modelcard.py:452] 2023-11-14 05:55:21,054 >> Dropping the following result as it does not have all the necessary fields:
{'dataset': {'name': 'communityai/apt-chat-micro-dataset-llm-v2-714k', 'type': 'communityai/apt-chat-micro-dataset-llm-v2-714k'}}
[INFO|configuration_utils.py:461] 2023-11-14 05:55:21,057 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json
2023-11-14 05:55:21 - INFO - main - Pushing to hub...
The text was updated successfully, but these errors were encountered:
I attempted to fine-tune a 6 billion parameter model using 8 A100 GPUs, but the training process encountered interruptions. On the first attempt, it stopped at 0.15 epochs, and on the second attempt, where I started from 2 epochs, it oddly skipped some epochs, jumping from 0.15 directly to 1, and then stopped at 2.25. For more detailed information, you can check this WandB link - https://wandb.ai/neural-network-018/huggingface/runs/8xmy6gtd/
Configs -
Model arguments
model_name_or_path: 01-ai/Yi-6B
model_revision: main
torch_dtype: bfloat16
use_flash_attention_2: false
trust_remote_code: true
Data training arguments
dataset_mixer:
communityai/apt-chat-micro-dataset-llm-v2-714k: 0.4
dataset_splits:
preprocessing_num_workers: 12
SFT trainer config
bf16: true
do_eval: true
evaluation_strategy: epoch
gradient_accumulation_steps: 4
gradient_checkpointing: false
hub_model_id: apt-chat-yi-6B-sft-full
hub_strategy: every_save
learning_rate: 0.00002
log_level: info
logging_steps: 50
logging_strategy: steps
lr_scheduler_type: cosine
max_seq_length: 4096
max_steps: -1
num_train_epochs: 2
output_dir: data/apt-chat-yi-6B-sft-full
overwrite_output_dir: true
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
push_to_hub: true
remove_unused_columns: true
report_to:
save_strategy: "no"
save_total_limit: null
seed: 42
tf32: true
LOGS -
INFO:root:Using nproc_per_node=8.
[2023-11-14 02:09:37,658] torch.distributed.run: [WARNING]
[2023-11-14 02:09:37,658] torch.distributed.run: [WARNING] *****************************************
[2023-11-14 02:09:37,658] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2023-11-14 02:09:37,658] torch.distributed.run: [WARNING] *****************************************
[2023-11-14 02:09:45,328] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,584] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The
optimize_cuda_cache
arguement will be deprecated soon, please useoptimize_device_cache
instead.warnings.warn(
[2023-11-14 02:09:45,607] [INFO] [comm.py:637:init_distributed] cdb=None
2023-11-14 02:09:45 - WARNING - main - Process rank: 7, device: cuda:7, n_gpu: 1 distributed training: True, 16-bits training: False
[2023-11-14 02:09:45,646] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,793] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,832] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,834] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,835] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The
optimize_cuda_cache
arguement will be deprecated soon, please useoptimize_device_cache
instead.warnings.warn(
[2023-11-14 02:09:45,864] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-14 02:09:45,908] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The
optimize_cuda_cache
arguement will be deprecated soon, please useoptimize_device_cache
instead.warnings.warn(
2023-11-14 02:09:45 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1 distributed training: True, 16-bits training: False
[2023-11-14 02:09:45,939] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-14 02:09:45,939] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-11-14 02:09:45 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: False
2023-11-14 02:09:45 - INFO - main - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='01-ai/Yi-6B', model_revision='main', model_code_revision=None, torch_dtype='bfloat16', trust_remote_code=True, use_flash_attention_2=False, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False)
2023-11-14 02:09:45 - INFO - main - Data parameters DataArguments(chat_template=None, dataset_mixer={'communityai/apt-chat-micro-dataset-llm-v2-714k': 0.4}, dataset_splits=['train', 'test'], max_train_samples=None, max_eval_samples=None, preprocessing_num_workers=12, truncation_side=None)
2023-11-14 02:09:45 - INFO - main - Training/evaluation parameters SFTConfig(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=epoch,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=apt-chat-yi-6B-sft-full,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=info,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=data/apt-chat-yi-6B-sft-full/runs/Nov14_02-09-45_6191edb408fa,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=50,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_seq_length=4096,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=2,
optim=adamw_torch,
optim_args=None,
output_dir=data/apt-chat-yi-6B-sft-full,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=1,
prediction_loss_only=False,
push_to_hub=True,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=data/apt-chat-yi-6B-sft-full,
save_on_each_node=False,
save_safetensors=True,
save_steps=500,
save_strategy=no,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=True,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
2023-11-14 02:09:48 - INFO - main - Training on the following datasets and their proportions: ['train : 285436', 'test : 500']
++++++++++++++++++++++++++++++++++++++
YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}
2023-11-14 02:09:53 - INFO - main - *** Load pretrained model ***
neftune_noise_alpha - 5.0
training_args - 2023-11-14 02:09:53 - INFO - main - *** Model loaded! ***
neftune_noise_alpha - 5.0
training_args - SFTConfig(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=epoch,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=apt-chat-yi-6B-sft-full,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=5,
log_level=info,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=data/apt-chat-yi-6B-sft-full/runs/Nov14_02-09-45_6191edb408fa,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=50,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_seq_length=4096,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=2,
optim=adamw_torch,
optim_args=None,
output_dir=data/apt-chat-yi-6B-sft-full,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=1,
prediction_loss_only=False,
push_to_hub=True,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=data/apt-chat-yi-6B-sft-full,
save_on_each_node=False,
save_safetensors=True,
save_steps=500,
save_strategy=no,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=True,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:145: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an
AutoModelForCausalLM
or aPeftModel
(if you passed apeft_config
) for you.warnings.warn(
[INFO|configuration_utils.py:717] 2023-11-14 02:09:53,295 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/config.json
[INFO|configuration_utils.py:717] 2023-11-14 02:09:53,384 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/config.json
[INFO|configuration_utils.py:777] 2023-11-14 02:09:53,386 >> Model config YiConfig {
"_name_or_path": "01-ai/Yi-6B",
"architectures": [
"YiForCausalLM"
],
"auto_map": {
"AutoConfig": "01-ai/Yi-6B--configuration_yi.YiConfig",
"AutoModel": "01-ai/Yi-6B--modeling_yi.YiModel",
"AutoModelForCausalLM": "01-ai/Yi-6B--modeling_yi.YiForCausalLM"
},
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_type": "Yi",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 4,
"pad_token_id": 0,
"rms_norm_eps": 1e-05,
"rope_theta": 5000000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.35.0",
"use_cache": true,
"vocab_size": 64000
}
[INFO|modeling_utils.py:3121] 2023-11-14 02:09:53,499 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/model.safetensors.index.json
[INFO|modeling_utils.py:1222] 2023-11-14 02:09:53,501 >> Instantiating YiForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:791] 2023-11-14 02:09:53,503 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0
}
[2023-11-14 02:10:02,797] [INFO] [config.py:972:print] DeepSpeedEngine configuration:
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] amp_enabled .................. False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] amp_params ................... False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] bfloat16_enabled ............. True
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_parallel_write_pipeline False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_tag_validation_enabled True
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_tag_validation_fail False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f3e8d053e50>
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] communication_data_type ...... None
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] curriculum_enabled_legacy .... False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] curriculum_params_legacy ..... False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] data_efficiency_enabled ...... False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] dataloader_drop_last ......... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] disable_allgather ............ False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] dump_state ................... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] dynamic_loss_scale_args ...... None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_enabled ........... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_gas_boundary_resolution 1
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_layer_name ........ bert.encoder.layer
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_layer_num ......... 0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_max_iter .......... 100
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_stability ......... 1e-06
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_tol ............... 0.01
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_verbose ........... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] elasticity_enabled ........... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_auto_cast ............... None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_enabled ................. False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_master_weights_and_gradients False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] global_rank .................. 0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] grad_accum_dtype ............. None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_accumulation_steps .. 4
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_clipping ............ 0.0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_predivide_factor .... 1.0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] initial_dynamic_scale ........ 1
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] load_universal_checkpoint .... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] loss_scale ................... 1.0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] memory_breakdown ............. False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] mics_hierarchial_params_gather False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] mics_shard_size .............. -1
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_legacy_fusion ...... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_name ............... None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_params ............. None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pld_enabled .................. False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pld_params ................... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] prescale_gradients ........... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] scheduler_name ............... None
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] scheduler_params ............. None
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] seq_parallel_communication_data_type torch.float32
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] sparse_attention ............. None
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] sparse_gradients_enabled ..... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] steps_per_print .............. inf
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] train_batch_size ............. 32
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] train_micro_batch_size_per_gpu 1
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] use_node_local_storage ....... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] wall_clock_breakdown ......... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] weight_quantization_config ... None
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] world_size ................... 8
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_allow_untested_optimizer True
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=True stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_enabled ................. True
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_force_ds_cpu_optimizer .. True
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_optimization_stage ...... 3
[2023-11-14 02:10:02,799] [INFO] [config.py:962:print_user_config] json = {
"train_batch_size": 32,
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 4,
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "none",
"nvme_path": null
},
"offload_param": {
"device": "none",
"nvme_path": null
},
"stage3_gather_16bit_weights_on_model_save": true
},
"steps_per_print": inf,
"bf16": {
"enabled": true
},
"fp16": {
"enabled": false
},
"zero_allow_untested_optimizer": true
}
[INFO|trainer.py:1723] 2023-11-14 02:10:02,799 >> ***** Running training *****
[INFO|trainer.py:1724] 2023-11-14 02:10:02,799 >> Num examples = 285,436
[INFO|trainer.py:1725] 2023-11-14 02:10:02,799 >> Num Epochs = 2
[INFO|trainer.py:1726] 2023-11-14 02:10:02,799 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1729] 2023-11-14 02:10:02,799 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1730] 2023-11-14 02:10:02,799 >> Gradient Accumulation steps = 4
[INFO|trainer.py:1731] 2023-11-14 02:10:02,799 >> Total optimization steps = 17,840
[INFO|trainer.py:1732] 2023-11-14 02:10:02,801 >> Number of trainable parameters = 6,061,035,520
[INFO|integration_utils.py:718] 2023-11-14 02:10:02,802 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Currently logged in as: developer-team018 (neural-network-018). Use
wandb login --relogin
to force reloginwandb: Tracking run with wandb version 0.16.0
wandb: Run data is saved locally in /workspace/alignment-handbook/wandb/run-20231114_021003-8xmy6gtd
wandb: Run
wandb offline
to turn off syncing.wandb: Syncing run robust-plasma-26
wandb: ⭐️ View project at https://wandb.ai/neural-network-018/huggingface
wandb: 🚀 View run at https://wandb.ai/neural-network-018/huggingface/runs/8xmy6gtd
0%| | 0/17840 [00:00<?, ?it/s][WARNING|tokenization_utils_base.py:3831] 2023-11-14 02:10:23,512 >> Token indices sequence length is longer than the specified maximum sequence length for this model (6114 > 4096). Running this sequence through the model will result in indexing errors
{'loss': 1.7024, 'learning_rate': 1.9999999844947046e-05, 'epoch': 0.0}
{'loss': 1.1507, 'learning_rate': 1.999961237011484e-05, 'epoch': 0.01}
{'loss': 1.0928, 'learning_rate': 1.9998449510510744e-05, 'epoch': 0.01}
{'loss': 1.0793, 'learning_rate': 1.999651151133954e-05, 'epoch': 0.02}
{'loss': 1.0867, 'learning_rate': 1.999379852284651e-05, 'epoch': 0.02}
{'loss': 1.0857, 'learning_rate': 1.999031075535873e-05, 'epoch': 0.03}
{'loss': 1.0721, 'learning_rate': 1.9986048479268788e-05, 'epoch': 0.03}
{'loss': 1.0923, 'learning_rate': 1.99810120250138e-05, 'epoch': 0.04}
{'loss': 1.0836, 'learning_rate': 1.9975201783049804e-05, 'epoch': 0.04}
{'loss': 1.0769, 'learning_rate': 1.9968618203821487e-05, 'epoch': 0.05}
{'loss': 1.0574, 'learning_rate': 1.9961261797727256e-05, 'epoch': 0.06}
{'loss': 1.042, 'learning_rate': 1.9953133135079686e-05, 'epoch': 0.06}
{'loss': 1.0554, 'learning_rate': 1.9944232846061284e-05, 'epoch': 0.07}
{'loss': 1.0735, 'learning_rate': 1.993456162067566e-05, 'epoch': 0.07}
{'loss': 1.0785, 'learning_rate': 1.992412020869401e-05, 'epoch': 0.08}
{'loss': 1.0654, 'learning_rate': 1.9912909419596993e-05, 'epoch': 0.08}
{'loss': 1.0606, 'learning_rate': 1.9900930122511993e-05, 'epoch': 0.09}
{'loss': 1.0664, 'learning_rate': 1.988818324614572e-05, 'epoch': 0.1}
{'loss': 1.0604, 'learning_rate': 1.9874669778712215e-05, 'epoch': 0.1}
{'loss': 1.0674, 'learning_rate': 1.9860390767856244e-05, 'epoch': 0.11}
{'loss': 1.042, 'learning_rate': 1.984534732057208e-05, 'epoch': 0.11}
{'loss': 1.0452, 'learning_rate': 1.9829540603117667e-05, 'epoch': 0.12}
{'loss': 1.0577, 'learning_rate': 1.9812971840924222e-05, 'epoch': 0.12}
{'loss': 1.0471, 'learning_rate': 1.979564231850122e-05, 'epoch': 0.13}
{'loss': 1.0704, 'learning_rate': 1.977755337933682e-05, 'epoch': 0.13}
{'loss': 1.0282, 'learning_rate': 1.9758706425793702e-05, 'epoch': 0.14}
{'loss': 1.0515, 'learning_rate': 1.973910291900036e-05, 'epoch': 0.15}
{'loss': 1.0548, 'learning_rate': 1.97187443787378e-05, 'epoch': 0.15}
8%|██▌ | 1368/17840 [1:50:57<19:16:41, 4.21s/it][INFO|trainer.py:3158] 2023-11-14 04:01:02,181 >> ***** Running Evaluation *****
[INFO|trainer.py:3160] 2023-11-14 04:01:02,182 >> Num examples = 500
[INFO|trainer.py:3163] 2023-11-14 04:01:02,182 >> Batch size = 1
0%| | 0/63 [00:00<?, ?it/s]
3%|█▍ | 2/63 [00:00<00:12, 5.02it/s]
5%|██ | 3/63 [00:00<00:12, 4.76it/s]
6%|██▊ | 4/63 [00:00<00:15, 3.89it/s]
8%|███▍ | 5/63 [00:01<00:16, 3.50it/s]
10%|████▏ | 6/63 [00:01<00:17, 3.30it/s]
11%|████▉ | 7/63 [00:01<00:17, 3.20it/s]
13%|█████▌ | 8/63 [00:02<00:17, 3.12it/s]
8%|██▌ | 1368/17840 [1:51:02<19:16:41, 4.21s/it]
14%|██████▎ | 9/63 [00:02<00:17, 3.14it/s]
{'loss': 0.9636, 'learning_rate': 1.9697632383321755e-05, 'epoch': 1.0}
{'loss': 0.9026, 'learning_rate': 1.96757685694803e-05, 'epoch': 1.01}
{'loss': 0.8808, 'learning_rate': 1.965315463222695e-05, 'epoch': 1.01}
{'loss': 0.8712, 'learning_rate': 1.9629792324729302e-05, 'epoch': 1.02}
{'loss': 0.8967, 'learning_rate': 1.960568345817306e-05, 'epoch': 1.03}
{'loss': 0.8676, 'learning_rate': 1.9580829901621666e-05, 'epoch': 1.03}
{'loss': 0.8723, 'learning_rate': 1.9555233581871366e-05, 'epoch': 1.04}
{'loss': 0.9122, 'learning_rate': 1.9528896483301866e-05, 'epoch': 1.04}
{'loss': 0.8687, 'learning_rate': 1.9501820647722458e-05, 'epoch': 1.05}
{'loss': 0.8726, 'learning_rate': 1.947400817421375e-05, 'epoch': 1.05}
{'loss': 0.8505, 'learning_rate': 1.944546121896493e-05, 'epoch': 1.06}
{'loss': 0.8458, 'learning_rate': 1.9416181995106585e-05, 'epoch': 1.07}
{'loss': 0.8721, 'learning_rate': 1.9386172772539162e-05, 'epoch': 1.07}
{'loss': 0.8676, 'learning_rate': 1.9355435877756957e-05, 'epoch': 1.08}
{'loss': 0.8826, 'learning_rate': 1.9323973693667762e-05, 'epoch': 1.08}
{'loss': 0.8607, 'learning_rate': 1.929178865940815e-05, 'epoch': 1.09}
{'loss': 0.8561, 'learning_rate': 1.925888327015434e-05, 'epoch': 1.09}
{'loss': 0.8687, 'learning_rate': 1.9225260076928783e-05, 'epoch': 1.1}
{'loss': 0.874, 'learning_rate': 1.919092168640239e-05, 'epoch': 1.1}
{'loss': 0.8563, 'learning_rate': 1.915587076069243e-05, 'epoch': 1.11}
{'loss': 0.8445, 'learning_rate': 1.9120110017156172e-05, 'epoch': 1.12}
{'loss': 0.8646, 'learning_rate': 1.908364222818019e-05, 'epoch': 1.12}
{'loss': 0.8479, 'learning_rate': 1.9046470220965457e-05, 'epoch': 1.13}
{'loss': 0.8788, 'learning_rate': 1.9008596877308157e-05, 'epoch': 1.13}
{'loss': 0.9, 'learning_rate': 1.8970025133376252e-05, 'epoch': 1.14}
{'loss': 0.8791, 'learning_rate': 1.893075797948188e-05, 'epoch': 1.14}
{'loss': 0.9254, 'learning_rate': 1.889079845984951e-05, 'epoch': 1.15}
15%|█████ | 2736/17840 [3:42:25<17:42:31, 4.22s/it][INFO|trainer.py:3158] 2023-11-14 05:52:30,316 >> ***** Running Evaluation *****
[INFO|trainer.py:3160] 2023-11-14 05:52:30,317 >> Num examples = 500
[INFO|trainer.py:3163] 2023-11-14 05:52:30,317 >> Batch size = 1
0%| | 0/63 [00:00<?, ?it/s]
3%|█▍ | 2/63 [00:00<00:10, 6.07it/s]
5%|██ | 3/63 [00:00<00:14, 4.20it/s]
6%|██▊ | 4/63 [00:01<00:16, 3.63it/s]
8%|███▍ | 5/63 [00:01<00:17, 3.37it/s]
10%|████▏ | 6/63 [00:01<00:17, 3.23it/s]
11%|████▉ | 7/63 [00:02<00:17, 3.16it/s]
13%|█████▌ | 8/63 [00:02<00:17, 3.06it/s]
{'eval_loss': 1.0676991939544678, 'eval_runtime': 4.5191, 'eval_samples_per_second': 110.641, 'eval_steps_per_second': 13.941, 'epoch': 1.15}
15%|█████ | 2736/17840 [3:42:30<17:42:31, 4.22s/it]
14%|██████▎ | 9/63 [00:02<00:17, 3.09it/s]
[INFO|trainer.py:1955] 2023-11-14 05:52:34,837 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 13352.0365, 'train_samples_per_second': 42.755, 'train_steps_per_second': 1.336, 'train_loss': 0.9719247023264567, 'epoch': 1.15}
15%|█████ | 2736/17840 [3:42:30<20:28:20, 4.88s/it]
***** train metrics *****
epoch = 1.15
train_loss = 0.9719
train_runtime = 3:42:32.03
train_samples = 285436
train_samples_per_second = 42.755
train_steps_per_second = 1.336
2023-11-14 05:52:34 - INFO - main - *** Evaluate ***
[INFO|trainer.py:3158] 2023-11-14 05:52:34,843 >> ***** Running Evaluation *****
[INFO|trainer.py:3160] 2023-11-14 05:52:34,843 >> Num examples = 500
[INFO|trainer.py:3163] 2023-11-14 05:52:34,844 >> Batch size = 1
14%|██████▎ | 9/63 [00:02<00:16, 3.23it/s]
***** eval metrics *****
epoch = 1.15
eval_loss = 1.0677
eval_runtime = 0:00:04.48
eval_samples = 500
eval_samples_per_second = 111.451
eval_steps_per_second = 14.043
2023-11-14 05:52:39 - INFO - main - *** Save model ***
[INFO|trainer.py:2881] 2023-11-14 05:52:43,590 >> Saving model checkpoint to data/apt-chat-yi-6B-sft-full
[INFO|configuration_utils.py:461] 2023-11-14 05:52:43,592 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json
[INFO|configuration_utils.py:564] 2023-11-14 05:52:43,592 >> Configuration saved in data/apt-chat-yi-6B-sft-full/generation_config.json
[INFO|modeling_utils.py:2201] 2023-11-14 05:52:51,334 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at data/apt-chat-yi-6B-sft-full/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2428] 2023-11-14 05:52:51,336 >> tokenizer config file saved in data/apt-chat-yi-6B-sft-full/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-11-14 05:52:51,337 >> Special tokens file saved in data/apt-chat-yi-6B-sft-full/special_tokens_map.json
[INFO|trainer.py:2881] 2023-11-14 05:52:55,599 >> Saving model checkpoint to data/apt-chat-yi-6B-sft-full
[INFO|configuration_utils.py:461] 2023-11-14 05:52:55,601 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json
[INFO|configuration_utils.py:564] 2023-11-14 05:52:55,601 >> Configuration saved in data/apt-chat-yi-6B-sft-full/generation_config.json
[INFO|modeling_utils.py:2201] 2023-11-14 05:53:06,302 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at data/apt-chat-yi-6B-sft-full/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2428] 2023-11-14 05:53:06,303 >> tokenizer config file saved in data/apt-chat-yi-6B-sft-full/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-11-14 05:53:06,304 >> Special tokens file saved in data/apt-chat-yi-6B-sft-full/special_tokens_map.json
2023-11-14 05:55:20 - INFO - main - Model saved to data/apt-chat-yi-6B-sft-full
[INFO|modelcard.py:452] 2023-11-14 05:55:21,054 >> Dropping the following result as it does not have all the necessary fields:
{'dataset': {'name': 'communityai/apt-chat-micro-dataset-llm-v2-714k', 'type': 'communityai/apt-chat-micro-dataset-llm-v2-714k'}}
[INFO|configuration_utils.py:461] 2023-11-14 05:55:21,057 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json
2023-11-14 05:55:21 - INFO - main - Pushing to hub...
The text was updated successfully, but these errors were encountered: