-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: GRPO support #2340
Comments
Yes please. We need a reference-level implementation that's both clear and concise. Should obviously leverage FSDP2 |
Hi all, thanks for you interest! We have a couple of PRs cooking. Please feel free to take a look and contribute/review. |
I've also started working on GRPO on my end, @felipemello1 can you point me to those PRs? would love to collaborate on this. I'm especially interested in the RLVR approach used in DeepSeek R1 and Tulu 3 |
Take a look at this one that is being done by an user: #2326 |
@akashc1 @dzheng256 @tikikun @vgoklani and others, please check the new comment with a task list. If you guys feel like contributing, that would be a good starting point: #2326 |
As you all might have already known by now DeepSeek-R1 with its GRPO training was quite successful, should we consider bringing GRPO into torchtune?
The text was updated successfully, but these errors were encountered: