We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In the newest code, why the dtype for loading models is changed to float32 rather than bfloat16?
The text was updated successfully, but these errors were encountered:
Based on our experiments, loading PRM in bf16 may cause convergence issue, so we set it to fp32 by default.
Sorry, something went wrong.
Should we also use float32 for the ActorRolloutRefWorker?
I think it's not necessary, we are still validating the effect of bf16 on PRM. On larger models (32B) it also converges well
No branches or pull requests
In the newest code, why the dtype for loading models is changed to float32 rather than bfloat16?
The text was updated successfully, but these errors were encountered: