Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: printing signature fields in verbose mode for signature_opt #547

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

ragul-kachiappan-dev
Copy link
Contributor

@ragul-kachiappan-dev ragul-kachiappan-dev commented Mar 4, 2024

#482 introduced _print_signature() for verbose mode in signature optimizer.

def _print_signature(self, predictor):
        if self.verbose:
            if (hasattr(predictor, 'extended_signature')):
                signature = predictor.extended_signature
            else:
                signature = predictor.extended_signature1
            print(f"i: {signature.instructions}")
            print(f"p: {list(signature.fields().values())[-1].json_schema_extra['prefix']}")
            print()

fields() is a property method in SignatureMeta class.

@property
def fields(cls):
    # Make sure to give input fields before output fields
    return {**cls.input_fields, **cls.output_fields}`

"fields" method is being used incorrectly in _print_signature()
The current _print_signature() implementation is raising 'TypeError: 'dict' object is not callable' when trying to use signature optimizer as per the tutorial provided in https://dspy-docs.vercel.app/docs/deep-dive/teleprompter/signature-optimizer

The tutorial for signature optimizer provided in the docs feels a bit incomplete. The tutorial example uses HotPotQA and GSM8K datasets inconsistently. While trying the example with HotPotQA dataset, I encountered an error of input keys not being set. Then I noticed that while creating Example objects for data, the with_inputs() method is not called in HotPotQA whereas it is called correctly in GSM8K. I am not sure if it is a known issue.

Example objects creation for hotpotqa:

    def _shuffle_and_sample(self, split, data, size, seed=0):
        '''
            The setting (seed=s, size=N) is always a subset
            of the setting (seed=s, size=M) for N < M.
        '''

        data = list(data)

        # Shuffle the data irrespective of the requested size.
        base_rng = random.Random(seed)

        if self.do_shuffle:
            base_rng.shuffle(data)

        data = data[:size]
        output = []

        for example in data:
            output.append(Example(**example, dspy_uuid=str(uuid.uuid4()), dspy_split=split))
        
        # TODO: NOTE: Ideally we use these uuids for dedup internally, for demos and internal train/val splits.
        # Now, some tasks (like convQA and Colors) have overlapping examples. Here, we should allow the user to give us
        # a uuid field that would respect this in some way. This means that we need a more refined concept that
        # uuid (each example is unique) and more like a group_uuid.

        # rng = random.Random(seed)
        # rng.shuffle(data)

        return output

Example objects creation for gsm8k:

        trainset = [dspy.Example(**x).with_inputs('question') for x in trainset]
        devset = [dspy.Example(**x).with_inputs('question') for x in devset]
        testset = [dspy.Example(**x).with_inputs('question') for x in testset]

I tested the current change with a custom dataset I created from "math_qa" huggingface dataset. Verbose mode now works as expected. I am open to a better reasoning or raising my concerns as an issue.

@arnavsinghvi11
Copy link
Collaborator

LGTM!

@arnavsinghvi11 arnavsinghvi11 merged commit 58030c9 into stanfordnlp:main Mar 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants