what is the reason for "fix precision"? #47

Nipers · 2025-01-28T23:51:23Z

In the newest code, why the dtype for loading models is changed to float32 rather than bfloat16?

cgq15 · 2025-01-29T11:52:12Z

Based on our experiments, loading PRM in bf16 may cause convergence issue, so we set it to fp32 by default.

Nipers · 2025-01-30T09:11:26Z

Should we also use float32 for the ActorRolloutRefWorker?

cgq15 · 2025-02-03T16:02:21Z

I think it's not necessary, we are still validating the effect of bf16 on PRM. On larger models (32B) it also converges well

Provide feedback