-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide more information to the user #358
Conversation
StarCycle
commented
Aug 12, 2024
- remove abbreviations and use "longer names" in logs
- enable progress bar for evaluation by default
- enable asynchronized env by default
- add readme intro for resuming training
Add introduction about resuming training
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these suggestions. I've left some comments.
Perhaps you can also print the config at the beginning of training/eval. Currently you are using hydra and different config files may override each other. Sometimes a user may not remember the setting in another config file, or not know his/her config is override by another config file. Simply printing all configs at the beginning of a training/eval process can solve this problem, like what they did for mmengine. |
…e it in default.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! We will improve onboarding and ease of use.
Left a few comments
Re progressbar: you are right, I will not make it as an option. I still suggest to enable both progress bars (i.e., the progress bar for episodes and the bar for steps in an episode). Users can easily locate problems of evaluation if a step takes too long, or there are too many episodes |
As a side note, now it logs this at the beginning of training, which is very easy to read:
|
Thanks for revising this @StarCycle . My status is now "approving". I will also wait on @Cadene to approve as he has become involved. Btw, looks like style tests are not passing. Have you seen CONTRIBUTING.md for instructions on how to set up the pre-commit hook? |
@StarCycle By any chance, could you provide code to try this PR? I feel like at least the third section is missing from the PR description among the sections we advise to add:
See this PR description for instance: #281 Thanks! |
You are right! I explain it here: What this does
How it was tested?Not too much difference from the original code, just run How to checkout and try?Just run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very thankful for your time and suggestions.
I left some comments. I think some parts of this PR requires a bit more work.
On our side, we should better convey the purpose for the two sections of the PR description. They are actually quite important to make faster progress on LeRobot. I would have edited the PR description with these info:
How it was tested?
- Ran diffusion training and evaluation on pusht. Configs are displayed and look like this:


- Ran diffusion training and evaluation on pusht. Progress bars are displayed and look like this:

TODO:
- Run training and evaluation on aloha to validate
use_async_envs: false
- Run training and evaluation on xarm to validate
use_async_envs: false
How to checkout and try?
- To checkout the code:
git remote add starcycle [email protected]:StarCycle/easier_lerobot.git
git fetch starcycle
git checkout starcycle/main
- To run diffusion training and evaluation on pusht:
python lerobot/scripts/train.py policy=diffusion env=pusht \
eval.batch_size=1 eval.use_async_envs=false training.eval_freq=2
-
To run diffusion training and evaluation on aloha:
TODO -
To run diffusion training and evaluation on xarm:
TODO
Thanks for your help!!!!
When cfg.eval.batch_size > cfg.eval.n_episodes, raise an error instead of modifying cfg.eval.batch_size silently Co-authored-by: Remi <[email protected]>
Just out of curiosity, does LeRobot support multi-gpu training now? (you just mentioned slurm ^^ |
@StarCycle Yes we are working on a PR using |
Nice! |
Hi @Cadene, Are there other things that I need to complete to merge this PR? (ノ"◑ڡ◑)ノ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last update required and ready to merge!
Co-authored-by: Remi <[email protected]>
Co-authored-by: Alexander Soare <[email protected]> Co-authored-by: Remi <[email protected]>
Co-authored-by: Alexander Soare <[email protected]> Co-authored-by: Remi <[email protected]>