Callback based Trainer #83

jeremyasapp · 2019-09-17T13:14:53Z

Is your feature request related to a problem? Please describe.
Currently, customizing a model requires sometime overriding the Trainer. It would be best if the Trainer was an object that users didn't have to override. Furthermore, it would be good to be able to set more defaults values across the board. The Trainer is very generic which is great, but relying on the model more would simplify the interfaces, and improve user experience. The Trainer would be in charge of executing training, fp16, model parallelization, some logging, etc..

Describe the solution you'd like
The solution is to create a Model class with a set of methods (some of which optional) to be called by the Trainer. This has the added benefits that the new model objects can implement defaults for common parameters such as loss function or sampler.

Here are a set of potential methods for the model class:

forward: takes as input the data and output predictions
batch_train: takes as input a batch, returns the loss (uses forward)
batch_eval: takes as input a batch, and returns the metrics (uses forward)
aggregate_metrics: takes the outputs of the compute metrics function and aggregates
validation_metric: takes the output of aggregate_metrics and returns the main validation metric used for model selection, or maybe just a string?
optimize: called after backward to update the model

Some questions to answer:

How far to take this? Can imagine having the data sampling define as part of the model. How about data processing, could be useful for inference?
We could add an unbounded number of methods, how do we want to proceed on that regard? I propose a "add when needed" policy.

yukw777 · 2019-09-17T21:15:07Z

Our goal is to make the Trainer into "an object that users never had to override." I'd like to understand exactly why one would override the Trainer when customizing a model and add methods according to the specific use cases.

Not all of the methods you provided above seem to require overriding the Trainer.

forward: this is already part of our models. no need to override Trainer
batch_train: you can pass in a loss function to Trainer already. Do we have a specific use case where one needs more complex training logic?
batch_eval: same as batch_eval.
aggregate_metrics: this doesn't seem like it should be part of the model... it seems more natural to be part of the trainer
validation_metric: same as aggregate_metrics.
optimize: we can already pass in an optimizer to Trainer. Do we have a specific use case where one needs more complex optimization logic?

I think it's natural for data preprocessing and data post processing to be part of the model. Data preprocessing is already part of the model using Field, but something similar should happen for postprocessing too. For example, in a text classification model, a post processing step that's part of the model could be to turn the argmax of the softmax into its corresponding class name. In a seq2seq model, it could be to generate sentences.

jeremyasapp · 2019-09-17T21:42:14Z

@yukw777 thanks for your comments. I'll address a few of your points:

I'd like to understand exactly why one would override the Trainer when customizing a model and add methods according to the specific use cases.

This would be a good exercise.

batch_train: you can pass in a loss function to Trainer already. Do we have a specific use case where one needs more complex training logic?

A few things come to mind. First, I think if you are trying to migrate code that you already have, it would be easier to do so if you can modify more of the behavior. For example, you may be using a dictionary as output of your forward method. But loss functions operate over tensors, so you would have to modify the loss function as well. I think the point here is that instead of modifying many different objects, you only modify 1, always.

aggregate_metrics: this doesn't seem like it should be part of the model... it seems more natural to be part of the trainer

I'm not sure I agree. The process can be very different across models depending on the metric.

optimize: we can already pass in an optimizer to Trainer. Do we have a specific use case where one needs more complex optimization logic?

Yes, the current Trainer only takes a single optimizer. 1) You may need more than one optimizer (ex: GAN). 2) You may have a custom update rule that does not use the optimizer class. I've seen pytorch code that does that. Now to be fair, you could argue that something that uses more than one optimizer could have its own Trainer.

Data preprocessing is already part of the model using Field

That's not true, data processing is part of the TabularDataset at the moment, not the model

For example, in a text classification model, a post processing step that's part of the model could be to turn the argmax of the softmax into its corresponding class name

This is an example of the kind of customization that has analogies during training. But should it be part of the same model object that was used in training, or have a different model object for inference.

I forgot to write one of the main goals of this process, which is to add more defaults, to simplify user experience. For example for model X, sampling, loss and metrics are always the same, why have to write them every time?

Ultimately, I think user experience could benefit from having to modify the least number of objects to customize to your needs. I want to avoid someone having to customize: dataset, sampler, model, loss, metric, and optimizer, which implies understanding the interface of each of these objects.

yukw777 · 2019-09-18T04:17:30Z

@jeremyasapp thanks for the clarification. I think I understand the direction we need to take better now. I totally agree with your point about having sensible defaults to simplify the user experience.

I think I had some misunderstandings about the proposal, but now that I've taken another look with your clarification, I think all of those methods are a good starting point.

I'd actually add data (pre|post)processing to the mix. I forgot that data preprocessing is currently a part of the dataset itself. I think this should definitely be part of the model as well as postprocessing. I think people would generally want to use the same model object for training and inference.

nmatthews-asapp · 2020-03-03T17:00:48Z

This is being addressed by v0.5 (refactor branch) integrations with other training libraries right @jeremyasapp ?

jeremyasapp added the enhancement New feature or request label Sep 17, 2019

jeremyasapp mentioned this issue Oct 8, 2019

Option to write predictions to disk #97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callback based Trainer #83

Callback based Trainer #83

jeremyasapp commented Sep 17, 2019 •

edited

Loading

yukw777 commented Sep 17, 2019

jeremyasapp commented Sep 17, 2019

yukw777 commented Sep 18, 2019

nmatthews-asapp commented Mar 3, 2020

Callback based Trainer #83

Callback based Trainer #83

Comments

jeremyasapp commented Sep 17, 2019 • edited Loading

yukw777 commented Sep 17, 2019

jeremyasapp commented Sep 17, 2019

yukw777 commented Sep 18, 2019

nmatthews-asapp commented Mar 3, 2020

jeremyasapp commented Sep 17, 2019 •

edited

Loading