-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Dataset
ref made with Call
s results in error
#3572
Comments
Hey @chandlj thanks for the report, looking into this now. |
I have a couple notes and questions:
client = weave.init("my project")
calls = client.get_calls()
dataset = Dataset.from_calls(calls)
|
@gtarpenning Sorry for the confusion, I made a wrapper around So, it's more like this: calls = []
for data in inputs:
result, call = await model.predict.call(self, data)
calls.append(call)
dataset = Dataset.from_calls(calls) I can't really share a project link, but I'll copy the minimum code I think you need to understand: from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0.0)
class LLMEvaluator(weave.Scorer):
model_name: str = "gpt-4o"
stuff_prompt: ChatPromptTemplate
prompts: dict[str, str]
@classmethod
def from_prompt_file(cls, filepath: str) -> Self:
# ... load prompts here
return cls(
stuff_prompt=prompt,
prompts=prompts,
)
async def score_question(
self, question: str, options: list[str], answer: str
) -> dict[str, int]:
# Use pre-initialized chains
chain = RunnablePassthrough.assign(
score=self.stuff_prompt
| llm.with_structured_output(ClassificationScoreMetric)
)
scores = {}
all_inputs = []
for metric, query in self.prompts.items():
inputs = {
"metric": metric,
"query": query,
"question": question,
"options": "\n".join(options),
"answer": answer,
}
all_inputs.append(inputs)
results = await chain.abatch(all_inputs)
for result in results:
scores[result["metric"]] = result["score"].score
return scores
@weave.op()
async def score(self, output: list[QuestionItem]) -> dict[str, float]:
tasks = []
for data in output:
question = data["question"]
tasks.append(
self.score_question(
question["question"],
question["options"],
question["answer"],
)
)
scores = await asyncio.gather(*tasks)
metrics = list(self.prompts.keys())
return {
metric: np.mean([score[metric] for score in scores]) for metric in metrics
}
class IdentityModel(weave.Model):
@weave.op()
def predict(self, output: list[QuestionItem]) -> list[QuestionItem]:
return output
def score(eval_name: str, dataset_name: str, parallelism: int):
os.environ["WEAVE_PARALLELISM"] = str(parallelism)
evaluator = LLMEvaluator.from_prompt_file(...)
dataset = weave.ref(dataset_name).get()
evaluation = weave.Evaluation(
name=eval_name,
dataset=dataset,
scorers=[evaluator],
)
asyncio.run(evaluation.evaluate(IdentityModel())) I can get back to you with the output of |
I made a dataset like the following:
However, when I try to reload it using the following, nothing happens:
Then, when I try to access a specific row:
I get the following error:
The text was updated successfully, but these errors were encountered: