-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cosine_scaled_reward
compatibility with GRPO
#229
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! LGTM with a small question about replacing the getter with the same naming convention as other reward functions
@@ -26,7 +26,7 @@ | |||
from transformers.trainer_utils import get_last_checkpoint | |||
|
|||
from open_r1.configs import GRPOConfig | |||
from open_r1.rewards import accuracy_reward, cosine_scaled_reward, format_reward, reasoning_steps_reward | |||
from open_r1.rewards import accuracy_reward, format_reward, get_cosine_scaled_reward, reasoning_steps_reward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turbo nit: given the other reward functions aren't getters, what about we call this cosine_scaled_reward
and use_cosine_scaled_reward
as the inner function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would be more confusing actually, for 2 reasons
- You would read this in the registery:
REWARD_FUNCS_REGISTRY = {
"accuracy": accuracy_reward,
"format": format_reward,
"reasoning_steps": reasoning_steps_reward,
"cosine": cosine_scaled_reward(min_value_wrong=min_value_wrong, ...)
}
- In the logged metrics, you'd have this annoying
_
:
{..., 'rewards/accuracy_reward': 0.6, 'rewards/format_reward': 0.0, 'rewards/_cosine_scaled_reward': 0.0, 'epoch': 0.0}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point. let's keep the getter then
No description provided.