Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic Data Generation v2 #545

Merged
merged 14 commits into from
Mar 14, 2024
Merged

Synthetic Data Generation v2 #545

merged 14 commits into from
Mar 14, 2024

Conversation

krypticmouse
Copy link
Collaborator

@krypticmouse krypticmouse commented Mar 4, 2024

Usage:
[1] Features:

import dsp
from dspy.datasets import DataLoader
from dspy.experimental import Synthesizer, SynthesizerArguments

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

config = SynthesizerArguments()
synthesizer = Synthesizer(config=config)

syn_data = synthesizer.generate(GenerateAnswer, num_data=10)

synthesizer.export(data=syn_data, path="syn_datacsv")

[2] Batched Generation(Faster):

from dspy.datasets import DataLoader
from dspy.experimental import Synthesizer, SynthesizerArguments

dl = DataLoader()

data = dl.from_huggingface(
    "gsm8k", "main",
    fields=("question", "answer"),
    input_keys=("question",),
    split="train[:8]"
)

config = SynthesizerArguments()
synthesizer = Synthesizer(config=config)

synthesizer.generate(
    ground_source=data,
    num_data=10,
    batch_size=2,
)

[3] Feedback Driven Generation

config = SynthesizerArguments(
    feedback_mode="llm", # or "human" for human in a loop generation
    num_example_for_feedback=3,
)
synthesizer = Synthesizer(config=config)

synthesizer.generate(
    ground_source=data,
    num_data=10,
    batch_size=2,
)

[4] Tweakable LM for input and output gen and support for module based output generation

[5] Example based input generation optimization

dspy/experimental/synthesizer.py Outdated Show resolved Hide resolved
dspy/experimental/synthesizer.py Outdated Show resolved Hide resolved
@krypticmouse
Copy link
Collaborator Author

krypticmouse commented Mar 5, 2024

Thanks a lot for the suggestions and fixes!!

@krypticmouse krypticmouse changed the title Synthetic Data Generation v2 [WIP] Synthetic Data Generation v2 Mar 9, 2024
@krypticmouse krypticmouse changed the title [WIP] Synthetic Data Generation v2 Synthetic Data Generation v2 Mar 14, 2024
@krypticmouse krypticmouse merged commit cdd661d into main Mar 14, 2024
4 checks passed
@chiragshah285
Copy link

@krypticmouse would it be possible to add this to the vercel documnetation? Also what does refer to?
[4] Tweakable LM for input and output gen and support for module based output generation

[5] Example based input generation optimization

@beltrewilton
Copy link

beltrewilton commented Jul 12, 2024

Fix potential import bug in dspy.experimental module

The code may contain a bug in the import statement within /dspy/experimental/init.py. The original import statement is: from module_graph import *

This has been changed to: from .module_graph import *

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants