Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cosine_scaled_reward compatibility with GRPO #229

Merged
merged 3 commits into from
Feb 7, 2025

Conversation

qgallouedec
Copy link
Member

No description provided.

src/open_r1/grpo.py Outdated Show resolved Hide resolved
Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! LGTM with a small question about replacing the getter with the same naming convention as other reward functions

@@ -26,7 +26,7 @@
from transformers.trainer_utils import get_last_checkpoint

from open_r1.configs import GRPOConfig
from open_r1.rewards import accuracy_reward, cosine_scaled_reward, format_reward, reasoning_steps_reward
from open_r1.rewards import accuracy_reward, format_reward, get_cosine_scaled_reward, reasoning_steps_reward
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turbo nit: given the other reward functions aren't getters, what about we call this cosine_scaled_reward and use_cosine_scaled_reward as the inner function

Copy link
Member Author

@qgallouedec qgallouedec Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be more confusing actually, for 2 reasons

  1. You would read this in the registery:
REWARD_FUNCS_REGISTRY = {
    "accuracy": accuracy_reward,
    "format": format_reward,
    "reasoning_steps": reasoning_steps_reward,
    "cosine": cosine_scaled_reward(min_value_wrong=min_value_wrong, ...)
}
  1. In the logged metrics, you'd have this annoying _:

{..., 'rewards/accuracy_reward': 0.6, 'rewards/format_reward': 0.0, 'rewards/_cosine_scaled_reward': 0.0, 'epoch': 0.0}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point. let's keep the getter then

@qgallouedec qgallouedec merged commit dd915f8 into main Feb 7, 2025
1 check passed
@qgallouedec qgallouedec deleted the fix-cosine_scaled_reward branch February 7, 2025 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'functools.partial' object has no attribute '__name__'. Did you mean: '__ne__'?
2 participants