Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about training #38

Open
QingZhuanya opened this issue Oct 30, 2024 · 2 comments
Open

A question about training #38

QingZhuanya opened this issue Oct 30, 2024 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@QingZhuanya
Copy link

Thank you for the excellent project. May I ask why I got stuck during phase1 training?
微信图片_20241030114925

@QingZhuanya QingZhuanya changed the title Question about training problem A question about training problem Oct 30, 2024
@QingZhuanya QingZhuanya changed the title A question about training problem A question about training Oct 30, 2024
@YTEP-ZHI
Copy link
Collaborator

Hi @QingZhuanya, thanks for your question. I think this issue is caused by the data loading and has something to do with the number of workers. You might need to try to set a different number of workers in your config to see how it works. For instance, set the following num_workers from 16 to 4:

num_workers: 16
subsets:

@QingZhuanya
Copy link
Author

Hi @QingZhuanya, thanks for your question. I think this issue is caused by the data loading and has something to do with the number of workers. You might need to try to set a different number of workers in your config to see how it works. For instance, set the following num_workers from 16 to 4:

num_workers: 16
subsets:

Thanks for the answer, I changed it according to your method, but unfortunately it's still stuck. My environment is 8 A100. Is there any other method? Thank you.

@YTEP-ZHI YTEP-ZHI added the help wanted Extra attention is needed label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants