Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting huge number training steps #37

Open
008karan opened this issue Mar 5, 2020 · 0 comments
Open

Getting huge number training steps #37

008karan opened this issue Mar 5, 2020 · 0 comments

Comments

@008karan
Copy link

008karan commented Mar 5, 2020

I have generated pretraining data using the given steps in this repo.
I am doing this for the Hindi language with 22gb of data. Generating pretraining data itself took 1 month!
So I have meta_data file associated with each tf.record file. I have added all the train_data_size values from all the meta_data files to make one meta_data file because in run_pretraining.py requires it. So my final meta_data file which looks something like this:

{
    "task_type": "albert_pretraining",
    "train_data_size": 596972848,
    "max_seq_length": 512,
    "max_predictions_per_seq": 20
}

Here number of training steps are calculated as below:

num_train_steps = int(total_train_examples / train_batch_size) * num_train_epochs

So total_train_examples is 596972848 hence I am getting num_train_steps to be 9327700 with batch size of 64 and with 1 epoch only. I saw that in readme here num_train_steps=125000. I am not getting whats went wrong here.

With such huge train steps, it will take forever to train Albert. Even if I make batch size to 512 with 1 epoch only the training step will be 1165962 which is still huge!
As Albert was trained on very huge data why there are only 125000 steps only?
Want to know-how many epochs are there in Albert training for English?

Can anyone suggest what went wrong and what should I do now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant