feat: implementing STaR: Self Taught Reasoner #1478

GitHoobar · 2025-01-21T17:33:34Z

Description

A basic implementation of STaR: Self Taught Reasoner

Motivation and Context

Why is this change required? What problem does it solve?
close #1411

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of example)

Implemented Tasks

STaR: Self Taught Reasoner

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.

Wendong-Fan

Thanks @GitHoobar for the contribution! Left some comment below

camel/datagen/star/star_pipeline.py

camel/datagen/star/templates.py

Wendong-Fan

Thanks for the contribution! Overall looks great, left some comments below and created one enhance PR here: https://github.com/camel-ai/camel/pull/1514/files feel free to review and leave your comment!

camel/datagen/star/star_pipeline.py

Wendong-Fan · 2025-01-26T22:20:43Z

camel/datagen/star/star_pipeline.py

+        prompt = STaRTemplates.improvement_template.format(
+            problem=problem, trace=trace, feedback=feedback
+        )
+        response = self.agent.step(prompt)


resolve the conversation if you have updated

GitHoobar · 2025-01-26T23:53:28Z

the refactor looks good, thanks @Wendong-Fan for the edits.

Asher-hss · 2025-01-30T03:32:09Z

examples/star_datagen/star_example.py

+    # Initialize reward model (optional)
+
+    # reward_model = NemotronRewardModel(
+    #     model_type=ModelType.NVIDIA_NEMOTRON_340B_REWARD,
+    #     url="https://integrate.api.nvidia.com/v1",
+    # )


Clean up unused code.

will be used to generate example @Wendong-Fan, update the env with suitable API in order to correctly use reward model.

Asher-hss · 2025-01-30T03:47:22Z

camel/datagen/star/star_pipeline.py

+
+            return evaluation.model_dump()
+
+    def improve_trace(self, problem: str, trace: str, feedback: str) -> str:


The original paper mentioned the term "rationalization," but I don't seem to see a similar implementation in this improvement method.

You're correct. The original STaR (Self-Taught Reasoner) paper uses rationalization. But, Our current implementation is different because it's a test-time method that directly generates reasoning without having access to ground truth solutions.
Focusing more on the data gen part at the moment.
cc: @Wendong-Fan

Asher-hss · 2025-01-30T03:49:51Z

camel/datagen/star/star_pipeline.py

+                json.dump(self.reasoning_traces, f, indent=2)
+
+    # Templates for generating reasoning, evaluation and improving them.
+    REASONING_TEMPLATE = """Let's solve this step by step:


When constructing the reasoning prompt, consider adding few-shot examples, as this can improve the performance to some extent. The original paper also adopts this approach.

this has been taken care of

Asher-hss · 2025-01-30T03:56:03Z

Thanks @GitHoobar @Wendong-Fan
LGTM overall, left some comments.

harryeqs

Thanks @GitHoobar ! LGTM!

Co-authored-by: Rishabh <[email protected]>

implementing STaR: Self Taught Reasoner

8bf5a02

GitHoobar linked an issue Jan 21, 2025 that may be closed by this pull request

[Feature Request] Implement STaR: Bootstrapping Reasoning With Reasoning #1411

Open

2 tasks

Wendong-Fan requested review from mohamadkav, Asher-hss and harryeqs January 21, 2025 17:41

Wendong-Fan assigned GitHoobar Jan 21, 2025

Wendong-Fan added the New Feature label Jan 21, 2025

Wendong-Fan added this to the Sprint 21 milestone Jan 21, 2025

Wendong-Fan requested a review from zjrwtx January 23, 2025 14:06

Wendong-Fan reviewed Jan 23, 2025

View reviewed changes

minor fixes

245d023

ZIYU-DEEP self-requested a review January 23, 2025 23:36

minor fixes

fa8406f

Wendong-Fan changed the title ~~implementing STaR: Self Taught Reasoner~~ feat: implementing STaR: Self Taught Reasoner Jan 24, 2025

Wendong-Fan reviewed Jan 26, 2025

View reviewed changes

Wendong-Fan mentioned this pull request Jan 26, 2025

enhance: STaR Integration #1514

Merged

13 tasks

Merge branch 'master' into feat/star-datagen

a1875e1

Merge branch 'master' into feat/star-datagen

4985ec4

Asher-hss reviewed Jan 30, 2025

View reviewed changes

harryeqs approved these changes Jan 30, 2025

View reviewed changes

enhance: STaR Integration (#1514)

44bea0d

Co-authored-by: Rishabh <[email protected]>

Asher-hss approved these changes Jan 30, 2025

View reviewed changes

GitHoobar and others added 6 commits January 30, 2025 17:46

minor fixes

1a5bd8e

bug fix

a78d517

fix

f46e3c6

update

7169cea

update with example math 500 data

0634891

add aime24

047d4ab

Wendong-Fan added 3 commits January 31, 2025 01:44

update

88b8bfc

update aime24 and amc23

6fbc4b8

add gaokao2023 and gsm8k

a9fffe1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implementing STaR: Self Taught Reasoner #1478

feat: implementing STaR: Self Taught Reasoner #1478

GitHoobar commented Jan 21, 2025

Wendong-Fan left a comment •

edited

Loading

Wendong-Fan left a comment •

edited

Loading

Wendong-Fan Jan 26, 2025

GitHoobar commented Jan 26, 2025

Asher-hss Jan 30, 2025

GitHoobar Jan 30, 2025

Asher-hss Jan 30, 2025

GitHoobar Jan 30, 2025

Asher-hss Jan 30, 2025

GitHoobar Jan 30, 2025

Asher-hss commented Jan 30, 2025

harryeqs left a comment


		return evaluation.model_dump()

		def improve_trace(self, problem: str, trace: str, feedback: str) -> str:

feat: implementing STaR: Self Taught Reasoner #1478

Are you sure you want to change the base?

feat: implementing STaR: Self Taught Reasoner #1478

Conversation

GitHoobar commented Jan 21, 2025

Description

Motivation and Context

Types of changes

Implemented Tasks

Checklist

Wendong-Fan left a comment • edited Loading

Choose a reason for hiding this comment

Wendong-Fan left a comment • edited Loading

Choose a reason for hiding this comment

Wendong-Fan Jan 26, 2025

Choose a reason for hiding this comment

GitHoobar commented Jan 26, 2025

Asher-hss Jan 30, 2025

Choose a reason for hiding this comment

GitHoobar Jan 30, 2025

Choose a reason for hiding this comment

Asher-hss Jan 30, 2025

Choose a reason for hiding this comment

GitHoobar Jan 30, 2025

Choose a reason for hiding this comment

Asher-hss Jan 30, 2025

Choose a reason for hiding this comment

GitHoobar Jan 30, 2025

Choose a reason for hiding this comment

Asher-hss commented Jan 30, 2025

harryeqs left a comment

Choose a reason for hiding this comment

Wendong-Fan left a comment •

edited

Loading

Wendong-Fan left a comment •

edited

Loading