Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implementing STaR: Self Taught Reasoner #1478

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

GitHoobar
Copy link
Collaborator

Description

A basic implementation of STaR: Self Taught Reasoner

Motivation and Context

Why is this change required? What problem does it solve?
close #1411

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Implemented Tasks

  • STaR: Self Taught Reasoner

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.

Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GitHoobar for the contribution! Left some comment below

camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/templates.py Outdated Show resolved Hide resolved
@ZIYU-DEEP ZIYU-DEEP self-requested a review January 23, 2025 23:36
@Wendong-Fan Wendong-Fan changed the title implementing STaR: Self Taught Reasoner feat: implementing STaR: Self Taught Reasoner Jan 24, 2025
Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Overall looks great, left some comments below and created one enhance PR here: https://github.com/camel-ai/camel/pull/1514/files feel free to review and leave your comment!

camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
camel/datagen/star/star_pipeline.py Outdated Show resolved Hide resolved
prompt = STaRTemplates.improvement_template.format(
problem=problem, trace=trace, feedback=feedback
)
response = self.agent.step(prompt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve the conversation if you have updated

@Wendong-Fan Wendong-Fan mentioned this pull request Jan 26, 2025
13 tasks
@GitHoobar
Copy link
Collaborator Author

the refactor looks good, thanks @Wendong-Fan for the edits.

Comment on lines 46 to 51
# Initialize reward model (optional)

# reward_model = NemotronRewardModel(
# model_type=ModelType.NVIDIA_NEMOTRON_340B_REWARD,
# url="https://integrate.api.nvidia.com/v1",
# )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up unused code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be used to generate example @Wendong-Fan, update the env with suitable API in order to correctly use reward model.


return evaluation.model_dump()

def improve_trace(self, problem: str, trace: str, feedback: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original paper mentioned the term "rationalization," but I don't seem to see a similar implementation in this improvement method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. The original STaR (Self-Taught Reasoner) paper uses rationalization. But, Our current implementation is different because it's a test-time method that directly generates reasoning without having access to ground truth solutions.
Focusing more on the data gen part at the moment.
cc: @Wendong-Fan

json.dump(self.reasoning_traces, f, indent=2)

# Templates for generating reasoning, evaluation and improving them.
REASONING_TEMPLATE = """Let's solve this step by step:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When constructing the reasoning prompt, consider adding few-shot examples, as this can improve the performance to some extent. The original paper also adopts this approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been taken care of

@Asher-hss
Copy link
Collaborator

Thanks @GitHoobar @Wendong-Fan
LGTM overall, left some comments.

Copy link
Collaborator

@harryeqs harryeqs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GitHoobar ! LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Feature Request] Implement STaR: Bootstrapping Reasoning With Reasoning
4 participants