Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Dataset ref made with Calls results in error #3572

Open
chandlj opened this issue Feb 1, 2025 · 3 comments
Open

Using Dataset ref made with Calls results in error #3572

chandlj opened this issue Feb 1, 2025 · 3 comments

Comments

@chandlj
Copy link

chandlj commented Feb 1, 2025

I made a dataset like the following:

calls = []
for data in row:
    res, call = await model.predict(data)
    calls.append(call)

dataset = Dataset.from_calls(calls)
weave.publish(dataset, name=dataset_name)

However, when I try to reload it using the following, nothing happens:

dataset = weave.ref(...).get()

evaluate = weave.Evaluate(dataset=dataset, scorers=[...])

Then, when I try to access a specific row:

print(dataset.rows[0]["output"])

I get the following error:

Traceback (most recent call last):
  ...
  File ".../site-packages/weave/trace/vals.py", line 435, in __getitem__
    rows = self.rows
           ^^^^^^^^^
  File ".../site-packages/weave/trace/vals.py", line 301, in rows
    self._rows = list(self._remote_iter())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/weave/trace/vals.py", line 425, in _remote_iter
    res = from_json(val, self.table_ref.project_id, self.server)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/weave/trace/serialize.py", line 261, in from_json
    return {k: from_json(v, project_id, server) for k, v in obj.items()}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/weave/trace/vals.py", line 593, in items
    yield k, self[k]
             ~~~~^^^
  File ".../site-packages/weave/trace/vals.py", line 567, in __getitem__
    return make_trace_obj(v, new_ref, self.server, self.root)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/weave/trace/vals.py", line 742, in make_trace_obj
    raise MissingSelfInstanceError(
weave.trace.vals.MissingSelfInstanceError: predict Op requires a bound self instance. Must be called from an instance method.
@gtarpenning
Copy link
Member

Hey @chandlj thanks for the report, looking into this now.

@gtarpenning
Copy link
Member

I have a couple notes and questions:

  • What happens when you print out dataset after dataset = weave.ref(...).get()? and dataset.rows?
  • The Dataset.from_calls(calls) method is meant to be use with weave calls, not live data from an LLM. Example:
client = weave.init("my project")
calls = client.get_calls()
dataset = Dataset.from_calls(calls)
  • I think more closely following the example from this docs page, using the async call evaluation.evaluate might be better for your async function calls.
  • Do you mind providing the project link you are working on? I can better inspect the objects you are creating, to determine if this is indeed a bug.
  • Generally, the MissingSelfInstanceError occurs when attempting to serialize/deserialize a function that has not been added to weave tracing. When creating your evaluation, make sure to wrap your scorers with weave.op.

@chandlj
Copy link
Author

chandlj commented Feb 6, 2025

@gtarpenning Sorry for the confusion, I made a wrapper around await model.predict(...) to actually get await model.predict.call(self, ...) but actually make sure it stayed type annotated (which I think some people have mentioned in other threads, but I can also add a +1 that better op type annotation would be great 😄)

So, it's more like this:

calls = []
for data in inputs:
    result, call = await model.predict.call(self, data)
    calls.append(call)

dataset = Dataset.from_calls(calls)

I can't really share a project link, but I'll copy the minimum code I think you need to understand:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0.0)

class LLMEvaluator(weave.Scorer):
    model_name: str = "gpt-4o"
    stuff_prompt: ChatPromptTemplate
    prompts: dict[str, str]

    @classmethod
    def from_prompt_file(cls, filepath: str) -> Self:
        # ... load prompts here

        return cls(
            stuff_prompt=prompt,
            prompts=prompts,
        )

    async def score_question(
        self, question: str, options: list[str], answer: str
    ) -> dict[str, int]:
        # Use pre-initialized chains
        chain = RunnablePassthrough.assign(
            score=self.stuff_prompt
            | llm.with_structured_output(ClassificationScoreMetric)
        )

        scores = {}
        all_inputs = []
        for metric, query in self.prompts.items():
            inputs = {
                "metric": metric,
                "query": query,
                "question": question,
                "options": "\n".join(options),
                "answer": answer,
            }
            all_inputs.append(inputs)

        results = await chain.abatch(all_inputs)
        for result in results:
            scores[result["metric"]] = result["score"].score

        return scores

    @weave.op()
    async def score(self, output: list[QuestionItem]) -> dict[str, float]:
        tasks = []
        for data in output:
            question = data["question"]
            tasks.append(
                self.score_question(
                    question["question"],
                    question["options"],
                    question["answer"],
                )
            )
        scores = await asyncio.gather(*tasks)
        metrics = list(self.prompts.keys())
        return {
            metric: np.mean([score[metric] for score in scores]) for metric in metrics
        }


class IdentityModel(weave.Model):
    @weave.op()
    def predict(self, output: list[QuestionItem]) -> list[QuestionItem]:
        return output


def score(eval_name: str, dataset_name: str, parallelism: int):
    os.environ["WEAVE_PARALLELISM"] = str(parallelism)
    evaluator = LLMEvaluator.from_prompt_file(...)
    dataset = weave.ref(dataset_name).get()

    evaluation = weave.Evaluation(
        name=eval_name,
        dataset=dataset,
        scorers=[evaluator],
    )

    asyncio.run(evaluation.evaluate(IdentityModel()))

I can get back to you with the output of weave.ref(dataset_name).get() and dataset.rows, I need to make a viable example for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants