Enable control over MLFlowLogger run_name str to match a pre-existing tag run_name in MLFlow and resume model training #3263
Labels
enhancement
New (engineering) enhancements, such as features or API changes.
🚀 Feature Request
We would like to resume a model training, passing the run_name from YAML, using the MLFlowLogger.
Motivation
Today MLFlowLogger receives the run_name string from the YAML config but it has no control over it as the str automatically append a random str to it, i.e. my-test => my-test-sgftKr at runtime.
In the MLFlowLogger docs:
but it always gets overridden by the random value after YAML parsing.
In the MLFlowLogger we have the filter string that captures the passed run_name randomly generated and it will not possible to match with a pre-existing run:
As explained in the {run_name} we will always find a random str appended to it for each new run.
[Optional] Implementation
Possible solution could be disable the random string generation by defining another environmental variable during YAML parsing, such as:
so that when the resume action is called, the run name is given to match the tag run name in MLFlow,
or directly
so that the str run_name is passed as is to MLFlowLogger to handle the resume.
Additional context
This is for a use case where we run training on the MosaicML platform and we log into MLFlow on Databricks platform.
Checkpointing is working fine, but the loss logging is wrong and separated because the unmatch of the random run_name force MLFlow to create a new run id for the resumed training.
The text was updated successfully, but these errors were encountered: