Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the reason for "fix precision"? #47

Open
Nipers opened this issue Jan 28, 2025 · 3 comments
Open

what is the reason for "fix precision"? #47

Nipers opened this issue Jan 28, 2025 · 3 comments

Comments

@Nipers
Copy link

Nipers commented Jan 28, 2025

In the newest code, why the dtype for loading models is changed to float32 rather than bfloat16?

@cgq15
Copy link
Contributor

cgq15 commented Jan 29, 2025

Based on our experiments, loading PRM in bf16 may cause convergence issue, so we set it to fp32 by default.

@Nipers
Copy link
Author

Nipers commented Jan 30, 2025

Should we also use float32 for the ActorRolloutRefWorker?

@cgq15
Copy link
Contributor

cgq15 commented Feb 3, 2025

I think it's not necessary, we are still validating the effect of bf16 on PRM. On larger models (32B) it also converges well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants