Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Cookbook: Example for Fetching Scores from Langfuse #857

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Sohammhatre10
Copy link

@Sohammhatre10 Sohammhatre10 commented Oct 14, 2024

Description

This update provides an example of using the fetch_scores() function from Langfuse to retrieve evaluation metrics. The example integrates UpTrain and Ragas for model evaluation and demonstrates how to log and fetch scores within Langfuse as mentioned in langfuse/langfuse#3505

Key Features

  1. Evaluation with UpTrain and Ragas:

    • Provides examples for evaluating context relevance, factual accuracy, response completeness, context precision, faithfulness, and answer relevancy..
  2. Fetching Scores:

    • Shows how to retrieve and filter scores using fetch_scores_from_langfuse.
  3. Correlation Matrix Visualization:

    • Adds a section that calculates and visualizes the correlation between UpTrain and Ragas evaluation scores using a heatmap.

Important

Adds an example for using Langfuse to fetch scores, evaluate models with UpTrain and Ragas, and visualize results using a correlation matrix.

  • Behavior:
    • Adds example for using fetch_scores() from Langfuse to retrieve evaluation metrics.
    • Demonstrates integration with UpTrain and Ragas for model evaluation.
    • Shows how to log and fetch scores within Langfuse.
  • Visualization:
    • Includes a section for calculating and visualizing correlation between evaluation scores using a heatmap.
  • Misc:
    • Minor whitespace changes in dspy.md, instructor.md, example-javascript.md, example-python-langgraph.md, example-python-instrumentation-module.md, example-python.md, example-vercel-ai.md, example_external_evaluation_pipelines.md, integration_dspy.md, integration_instructor.md, integration_langgraph.md, integration_llama-index_instrumentation.md, integration_llama_index_posthog_mistral.md, integration_mirascope.md, integration_mistral_sdk.md, integration_ollama.md, integration_openai_structured_output.md, example-langchain.md, js_integration_langchain.md, js_tracing_example_vercel_ai_sdk.md, prompt_management_langchain.md.

This description was created by Ellipsis for 02ebc24. It will automatically update as commits are pushed.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

vercel bot commented Oct 14, 2024

@Sohammhatre10 is attempting to deploy a commit to the langfuse Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link

CLAassistant commented Oct 14, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ Your Name
❌ Sohammhatre10


Your Name seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 02ebc24 in 42 seconds

More details
  • Looked at 1680 lines of code in 26 files
  • Skipped 3 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. pages/docs/integrations/dspy.md:242
  • Draft comment:
    Remove trailing whitespace for cleaner code. This issue is present in multiple files, such as example-javascript.md, example-python-langgraph.md, example-python-instrumentation-module.md, example-python.md, example-vercel-ai.md, external-evaluation-pipelines.md, integration_dspy.md, integration_instructor.md, integration_langgraph.md, integration_llama-index_instrumentation.md, integration_llama_index_posthog_mistral.md, integration_mirascope.md, integration_mistral_sdk.md, integration_ollama.md, integration_openai_structured_output.md, js_integration_langchain.md, js_tracing_example_vercel_ai_sdk.md, prompt_management_langchain.md.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR introduces a new example for fetching scores from Langfuse, but there are several instances of trailing whitespace in the markdown files. These should be removed for cleaner code.

Workflow ID: wflow_ON6OiLFA8uvvhNsK


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: Experimental PR review

PR Summary

This pull request adds a comprehensive example of using the fetch_scores() function from Langfuse to retrieve and analyze evaluation metrics, integrating UpTrain and Ragas for model evaluation.

  • Added pages/docs/scores/example_usage_of_fetch_score.md with detailed code snippets for setting up, evaluating models, logging scores, and visualizing correlations
  • Updated pages/guides/cookbook/example_external_evaluation_pipelines.md with a guide on creating external evaluation pipelines using Langfuse, including synthetic data creation and custom evaluations
  • Made minor formatting and content improvements across multiple integration cookbooks (DSPy, Instructor, LangGraph, etc.) to enhance readability and consistency
  • Updated various Langchain examples to demonstrate better integration with Langfuse for tracing and prompt management

26 file(s) reviewed, 7 comment(s)
Edit PR Review Bot Settings | Greptile

pages/docs/integrations/mirascope/example-python.md Outdated Show resolved Hide resolved
Copy link
Member

@marcklingen marcklingen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution. It seems like you mostly want to showcase correlation analysis of different scores in Langfuse (which is a good notebook example). Are you sure that your example correlates the scores on a single trace basis for the analysis at the bottom of this notebook

@Sohammhatre10
Copy link
Author

@marcklingen Yupp, this was based on a single trace, and the scores were fetched accordingly. Haven't used any specifics for traces, but this was the first trace I created, so it defaulted to the first trace. Should I add more specificity for a single trace? Apologies for the late reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants