You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thank you for publishing the pretraining and finetuning scripts! They are really helpful.
For a university project, we are trying to reproduce the results from the paper. However, running the pretrain script, we observe very slow training speeds (~1 minute per epoch) on our hardware.
Running the pytorch profiler for 16 training batches, we see the following:
Apparently the data loader needs 5 seconds for each batch, which is 84% of the full time of the training step.
After some further investigation, we found that the train data loader does the following:
Apply the transformation to a full time series.
Sample a window from the transformed data (inside the InstanceSplitter).
Extract the window from the transformed data (InstanceSplitter).
Create the batches of data according to the batch size.
This means a full timeseries gets transformed and then most of the transformed data is not used. This is then done for each item in a batch. We observed ~10 ms for transforming a full timeseries and with a batch size of 512, we get the >5 seconds reported by the profiler.
The order of execution is partly given by the gluonts package. So I am not aware of an obvious solution without addressing it there.
Now my question. Did you face the same issue during your experiments? How can we solve the problem we observe?
The text was updated successfully, but these errors were encountered:
Sure, thank you for having a look!
For now, I added a data = list(data) before the instance splitter is applied. This forces the transformation to be done before training starts. Obviously it's not the nicest solution ever since it takes a few minutes before training starts, but the total training time is improved a lot.
Hi,
Thank you for publishing the pretraining and finetuning scripts! They are really helpful.
For a university project, we are trying to reproduce the results from the paper. However, running the pretrain script, we observe very slow training speeds (~1 minute per epoch) on our hardware.
Running the pytorch profiler for 16 training batches, we see the following:
FIT Profiler Report (relevant lines)
Apparently the data loader needs 5 seconds for each batch, which is 84% of the full time of the training step.
After some further investigation, we found that the train data loader does the following:
This means a full timeseries gets transformed and then most of the transformed data is not used. This is then done for each item in a batch. We observed ~10 ms for transforming a full timeseries and with a batch size of 512, we get the >5 seconds reported by the profiler.
The order of execution is partly given by the gluonts package. So I am not aware of an obvious solution without addressing it there.
Now my question. Did you face the same issue during your experiments? How can we solve the problem we observe?
The text was updated successfully, but these errors were encountered: