Incorrect use of `optimizer_args` method in TorchTrainingPlan #452

srcansiz · 2023-01-09T14:40:12Z

optmizer_args is researcher/user side API to retrieve optimizer arguments in any method defined in TrainingPlan (including init_optimizer) by researcher. These optimizer arguments are defined on the researcher in experiment.training_args. But with the commit d2beee4 the method optimizer_args() is extended by adding update_optimizer_args method that gets learning rate from the optimizer. This change introduce a potential bug if the researcher tries to get optmizer_args() before Optimizer is defined.

def optimizer_args(self) -> Dict

    """Retrieves optimizer arguments

        Returns:
            Optimizer arguments
   """
        self.update_optimizer_args()  # update `optimizer_args` (eg after training)
        return self._optimizer_args

optimizer_args calls self.update_optimizer_args and update_optimizer_args calls get_learning_rate which raises error if the optimizer is not defined:

    def get_learning_rate(self) -> List[float]:

        if self._optimizer is None:
            raise FedbiomedTrainingPlanError(f"{ErrorNumbers.FB605.value}: Optimizer not found, please call "
                                             f"`init_optimizer beforehand")
        learning_rates = []
        params = self._optimizer.param_groups
        for param in params:
            learning_rates.append(param['lr'])
        return learning_rates

In this case if researhcer try to access optimizer_args before instatiating an optimizer:

class MyTrainingPlan(TorchTrainingPlan):
    def init_model(self):
        opt = self.optimizer_args() # Raises error
        return Baseline()

    def init_optimizer(self):
        optimizer_args = self.optimizer_args() #raises error
        return Optimizer(self.model().parameters(), lr=optimizer_args["lr"])

get_learning_rate is mandatory to use in order the retrieve default learning rate if the researcher does not define it in the optimizer arguments. Even TrainingArgs populates lr with default value, if researcher does not set it as argument of optimizer in def init_optimizer method, the correct learning rate will be accessible from the optimizer itself only. Since this information is critical for scaffold, optimizer arguments should be updated right after optimizer is created and after the training if we consider that some of optimizer arguments can update learning rates during the training.

Here is one of the flow that can be implemented

1 - Don't call update_optimizer_args in optmizer_args to avoid issues if the optimizer is not defined. optmizer_args() should remain as getter where it returns only self._optimizer_args
2 - Always populate lr in TrainingArgs with a default value (optional).
3 - Call update_optimizer_args right after optimizer is created. To make sure self._optimizer_args["lr"] has correct learning rate. This is to make sure that if self.optimizer_args() is used in training_step.
4 - Update learning rate after every iteration (if it is important for training_step method) or after training routine to make sure the lr that will be sent back to researcher will be the correct one.

Another (easy) Solution
Do not let researcher to use self.optmizer_args or self.model_arguments, and force researcher to set argument optimizer_args in def init_optimizer(self, optimizer_args)

The text was updated successfully, but these errors were encountered:

srcansiz · 2023-04-21T08:55:13Z

In GitLab by @sharkovsky on Apr 21, 2023, 10:55

A third option would be to catch the FedbiomedTrainingPlanError exception inside update_optimizer_args which is raised by get_learning_rate when the optimizer is not defined. In that case, update_optimizer_args can simply be a noop.

Would this work too? IMO it seems simple, but it still allows the researcher to access self.optimizer_args which may be important to them.

In general, I fully agree that a getter such as optimizer_args should NOT have any side effects. However, in this case I feel they are really minor: we are only updating the learning rate, so to me the current implementation is acceptable, even if it may lead to weird situations where we set the optimizer args, and soon after we call the getter and we see a different learning rate.

srcansiz · 2023-05-22T13:42:30Z

In GitLab by @ybouilla on May 22, 2023, 15:42

Hi all, things have changed since the Merge of branch Model !188 : get_learning_rate is now part of Model and generic_optimizer rather than training plans. This doesnot solve the issue though.

My understanding would be to change the behaviour of post_init methods of the TrainingPlans

srcansiz mentioned this issue May 30, 2023

Feature/447 Add researcher-side (optional) Optimizer #757

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect use of `optimizer_args` method in TorchTrainingPlan #452

Incorrect use of `optimizer_args` method in TorchTrainingPlan #452

srcansiz commented Jan 9, 2023

srcansiz commented Apr 21, 2023

srcansiz commented May 22, 2023

Incorrect use of optimizer_args method in TorchTrainingPlan #452

Incorrect use of optimizer_args method in TorchTrainingPlan #452

Comments

srcansiz commented Jan 9, 2023

srcansiz commented Apr 21, 2023

srcansiz commented May 22, 2023

Incorrect use of `optimizer_args` method in TorchTrainingPlan #452

Incorrect use of `optimizer_args` method in TorchTrainingPlan #452