Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Interface for generic logger #114

Open
AwePhD opened this issue Jul 12, 2024 · 4 comments
Open

[Question] Interface for generic logger #114

AwePhD opened this issue Jul 12, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@AwePhD
Copy link

AwePhD commented Jul 12, 2024

Hello,

I am starting to use NePS for HPO. I see that there is a class for logging in TensorBoard which is great. Are you interested to make an interface for any logger? Other logging tools are common such as MLFlow (I use this one), WandB and so on.

Thanks for NePS, the tools seems interesting and the documentation is great, I have little knowledge in HPO.

@eddiebergman
Copy link
Contributor

Hi @AwePhD,

Thanks for the kind words! We don't have any immediate plans to integrate in other loggers, primarily as it introduces some maintenance over head. I don't know about MLFlow but for W&B, as you control the run_pipeline function, you should be able to just stick it in there without a problem!

If you manage to get it to work, we'd be delighted if you could share any sample script or such that we could include in the documentation that we could share with others :)

Best,
Eddie

@eddiebergman eddiebergman added question Further information is requested documentation Improvements or additions to documentation labels Jul 30, 2024
@AwePhD
Copy link
Author

AwePhD commented Aug 20, 2024

Okay, thanks for the answer. Long story short, my deep learning (high-level) domain specific framework implements MLFlow settings and boilerplate. But it seems to conflict with NePS use case. I might investigate and manually set the logger in run_pipeline.

If I manually implement the MLFlow boilerplate in run_pipeline I will share it here or in another issue specific to MLFlow, according to how you want to organize the issues. Can I close the issue?

@eddiebergman
Copy link
Contributor

Feel free to leave it open if you plan to share back any findings here, would be super useful :) It would be good to know how the two conflict as I'm not familiar with MLFlow or why they would conflict

@AwePhD
Copy link
Author

AwePhD commented Aug 21, 2024

mlflow is working correctly with NePS, as I suggested this is my framework that makes a fuss. It does not stop / start mlflow runs correctly in the NePS use case. A manual mlflow.end_run is required at the end of run_pipeline. I added a bit more details below, it's mostly irrelevant if you do not use the same lib. But if you are interested with a DL use case, it is something.

mlflow works well with basic boilerplate. It logs the HP and metrics of each run. In a multi fidelity settings, it's fairly easy to resume previous (mlflow) run. Sadly, because I use my framework, I cannot offer a great detailed template. But here the idea.

def run_pipeline(pipeline_directory, previous_pipeline_directory, **config):
    # Instantiate model, optimizer and everything else
    # Insert mlflow setup: experiment and run names. Logs HP
    # Train + Validation for NePS
    # Close mlflow and other post things to do

I use mmlab suite of libraries, notably mmengine, mmcv, mmdet, and it manages one run in a Runner object that has a lot of responsibility. This is a highly modular framework, so everything is kind of plug to play. For instance the runner should instantiate a loop (training, test or/and val), optimizer (by the means of a supplemental wrapper), scaling LR, parameters schedulers, sets of hooks, a messagehub/log service, model, pipeline (train_dataloader + its processing) and so on. Those components can be set up with a configuration file and use register for instantiation based on the config file. The Runner is the glue for everything.

Thus, the Runner is meant to be the object with the longest lifetime. Plus, it's not meant to be reusable. One Runner object is for one model performing one task train+validation or test. That's it. NePS use case is different and Runner is not flexible enough. In another words, NePS manages multiple runs and Runner is meant to manage one run. So for one NePS run, we should instantiate everything again from scratch, even if most of the component could be reused.

Obviously, it might be doable to extend the mmengine's Runner but it would need some times. Regular user does (should?) not have to change the 2k LoC of Runner. The "quickest" way is to instantiate a runner for each run_pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
Status: No status
Development

No branches or pull requests

2 participants