You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have generated pretraining data using the given steps in this repo.
I am doing this for the Hindi language with 22gb of data. Generating pretraining data itself took 1 month!
So I have meta_data file associated with each tf.record file. I have added all the train_data_size values from all the meta_data files to make one meta_data file because in run_pretraining.py requires it. So my final meta_data file which looks something like this:
So total_train_examples is 596972848 hence I am getting num_train_steps to be 9327700 with batch size of 64 and with 1 epoch only. I saw that in readme here num_train_steps=125000. I am not getting whats went wrong here.
With such huge train steps, it will take forever to train Albert. Even if I make batch size to 512 with 1 epoch only the training step will be 1165962 which is still huge!
As Albert was trained on very huge data why there are only 125000 steps only?
Want to know-how many epochs are there in Albert training for English?
Can anyone suggest what went wrong and what should I do now?
The text was updated successfully, but these errors were encountered:
I have generated pretraining data using the given steps in this repo.
I am doing this for the Hindi language with 22gb of data. Generating pretraining data itself took 1 month!
So I have
meta_data
file associated with each tf.record file. I have added all thetrain_data_size
values from all themeta_data
files to make onemeta_data
file because inrun_pretraining.py
requires it. So my finalmeta_data
file which looks something like this:Here number of training steps are calculated as below:
num_train_steps = int(total_train_examples / train_batch_size) * num_train_epochs
So
total_train_examples
is 596972848 hence I am gettingnum_train_steps
to be 9327700 with batch size of 64 and with 1 epoch only. I saw that in readme herenum_train_steps=125000
. I am not getting whats went wrong here.With such huge train steps, it will take forever to train Albert. Even if I make batch size to 512 with 1 epoch only the training step will be 1165962 which is still huge!
As Albert was trained on very huge data why there are only 125000 steps only?
Want to know-how many epochs are there in Albert training for English?
Can anyone suggest what went wrong and what should I do now?
The text was updated successfully, but these errors were encountered: