From 57baca85afe069d55c3580ab26cf3624f1c0e714 Mon Sep 17 00:00:00 2001 From: Bagatur Date: Mon, 25 Nov 2024 11:51:17 -0800 Subject: [PATCH] fix links --- docs/evaluation/how_to_guides/async.mdx | 14 +++++----- .../how_to_guides/custom_evaluator.mdx | 8 +++--- .../evaluate_llm_application.mdx | 28 +++++++++---------- .../evaluate_on_intermediate_steps.mdx | 2 +- .../how_to_guides/evaluate_pairwise.mdx | 4 +-- .../how_to_guides/langchain_runnable.mdx | 2 +- docs/evaluation/how_to_guides/langgraph.mdx | 2 +- .../evaluation/how_to_guides/llm_as_judge.mdx | 6 ++-- .../manage_datasets_in_application.mdx | 6 ++-- docs/evaluation/how_to_guides/metric_type.mdx | 4 +-- .../how_to_guides/multiple_scores.mdx | 4 +-- .../how_to_guides/rate_limiting.mdx | 2 +- .../how_to_guides/run_evals_api_only.mdx | 2 +- .../how_to_guides/version_datasets.mdx | 2 +- 14 files changed, 43 insertions(+), 43 deletions(-) diff --git a/docs/evaluation/how_to_guides/async.mdx b/docs/evaluation/how_to_guides/async.mdx index 37c8cff6..77d0d982 100644 --- a/docs/evaluation/how_to_guides/async.mdx +++ b/docs/evaluation/how_to_guides/async.mdx @@ -4,19 +4,19 @@ import { CodeTabs, python } from "@site/src/components/InstructionsWithCode"; :::info Key concepts -[Evaluations](../../concepts#applying-evaluations) | [Evaluators](../../concepts#evaluators) | [Datasets](../../concepts#datasets) | [Experiments](../../concepts#experiments) +[Evaluations](../concepts#applying-evaluations) | [Evaluators](../concepts#evaluators) | [Datasets](../concepts#datasets) | [Experiments](../concepts#experiments) ::: -We can run evaluations asynchronously via the SDK using [aevaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html), -which accepts all of the same arguments as [evaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate.html) but expects the application function to be asynchronous. -You can learn more about how to use the `evaluate()` function [here](../../how_to_guides/evaluate_llm_application). +We can run evaluations asynchronously via the SDK using [aevaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._arunner.aevaluate.html), +which accepts all of the same arguments as [evaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._runner.evaluate.html) but expects the application function to be asynchronous. +You can learn more about how to use the `evaluate()` function [here](./evaluate_llm_application). :::info Python only This guide is only relevant when using the Python SDK. In JS/TS the `evaluate()` function is already async. -You can see how to use it [here](../../how_to_guides/evaluate_llm_application). +You can see how to use it [here](./evaluate_llm_application). ::: @@ -76,5 +76,5 @@ list 5 concrete questions that should be investigated to determine if the idea i ## Related -- [Run an evaluation (synchronously)](../../how_to_guides/evaluate_llm_application) -- [Handle model rate limits](../../how_to_guides/rate_limiting) +- [Run an evaluation (synchronously)](./evaluate_llm_application) +- [Handle model rate limits](./rate_limiting) diff --git a/docs/evaluation/how_to_guides/custom_evaluator.mdx b/docs/evaluation/how_to_guides/custom_evaluator.mdx index 9086b696..a586f16f 100644 --- a/docs/evaluation/how_to_guides/custom_evaluator.mdx +++ b/docs/evaluation/how_to_guides/custom_evaluator.mdx @@ -8,12 +8,12 @@ import { :::info Key concepts -- [Evaluators](../../concepts#evaluators) +- [Evaluators](../concepts#evaluators) ::: Custom evaluators are just functions that take a dataset example and the resulting application output, and return one or more metrics. -These functions can be passed directly into [evaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate.html) / [aevaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html). +These functions can be passed directly into [evaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._runner.evaluate.html) / [aevaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._arunner.aevaluate.html). ## Basic example @@ -138,5 +138,5 @@ answer is logically valid and consistent with question and the answer.""" ## Related -- [Evaluate aggregate experiment results](../../how_to_guides/summary): Define summary evaluators, which compute metrics for an entire experiment. -- [Run an evaluation comparing two experiments](../../how_to_guides/evaluate_pairwise): Define pairwise evaluators, which compute metrics by comparing two (or more) experiments against each other. +- [Evaluate aggregate experiment results](./summary): Define summary evaluators, which compute metrics for an entire experiment. +- [Run an evaluation comparing two experiments](./evaluate_pairwise): Define pairwise evaluators, which compute metrics by comparing two (or more) experiments against each other. diff --git a/docs/evaluation/how_to_guides/evaluate_llm_application.mdx b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx index a6b886c6..2ef1ed5d 100644 --- a/docs/evaluation/how_to_guides/evaluate_llm_application.mdx +++ b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx @@ -12,17 +12,17 @@ import { :::info Key concepts -[Evaluations](../../concepts#applying-evaluations) | [Evaluators](../../concepts#evaluators) | [Datasets](../../concepts#datasets) +[Evaluations](../concepts#applying-evaluations) | [Evaluators](../concepts#evaluators) | [Datasets](../concepts#datasets) ::: -In this guide we'll go over how to evaluate an application using the [evaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate.html) method in the LangSmith SDK. +In this guide we'll go over how to evaluate an application using the [evaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._runner.evaluate.html) method in the LangSmith SDK. :::tip -For larger evaluation jobs in Python we recommend using [aevaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html), the asynchronous version of `evaluate()`. +For larger evaluation jobs in Python we recommend using [aevaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._arunner.aevaluate.html), the asynchronous version of `evaluate()`. It is still worthwhile to read this guide first, as the two have nearly identical interfaces, -and then read the how-to guide on [running an evaluation asynchronously](../../how_to_guides/async). +and then read the how-to guide on [running an evaluation asynchronously](./async). ::: @@ -92,7 +92,7 @@ To understand how to annotate your code for tracing, please refer to [this guide ## Create or select a dataset -We need a [Dataset](../../concepts#datasets) to evaluate our application on. Our dataset will contain labeled [examples](../../concepts#examples) of toxic and non-toxic text. +We need a [Dataset](../concepts#datasets) to evaluate our application on. Our dataset will contain labeled [examples](../concepts#examples) of toxic and non-toxic text. -See [here](../../how_to_guides#dataset-management) for more on dataset management. +See [here](.#dataset-management) for more on dataset management. ## Define an evaluator -[Evaluators](../../concepts#evaluators) are functions for scoring your application's outputs. They take in the example inputs, actual outputs, and, when present, the reference outputs. +[Evaluators](../concepts#evaluators) are functions for scoring your application's outputs. They take in the example inputs, actual outputs, and, when present, the reference outputs. Since we have labels for this task, our evaluator can directly check if the actual outputs match the reference outputs. -See [here](../../how_to_guides#define-an-evaluator) for more on how to define evaluators. +See [here](.#define-an-evaluator) for more on how to define evaluators. ## Run the evaluation -We'll use the [evaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate.html) / [aevaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html) methods to run the evaluation. +We'll use the [evaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._runner.evaluate.html) / [aevaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._arunner.aevaluate.html) methods to run the evaluation. The key arguments are: @@ -214,11 +214,11 @@ The key arguments are: ]} /> -See [here](../../how_to_guides#run-an-evaluation) for other ways to kick off evaluations and [here](../../how_to_guides#configure-an-evaluation-job) for how to configure evaluation jobs. +See [here](.#run-an-evaluation) for other ways to kick off evaluations and [here](.#configure-an-evaluation-job) for how to configure evaluation jobs. ## Explore the results -Each invocation of `evaluate()` creates an [Experiment](../../concepts#experiments) which can be viewed in the LangSmith UI or queried via the SDK. +Each invocation of `evaluate()` creates an [Experiment](../concepts#experiments) which can be viewed in the LangSmith UI or queried via the SDK. Evaluation scores are stored against each actual output as feedback. _If you've annotated your code for tracing, you can open the trace of each row in a side panel view._ @@ -364,6 +364,6 @@ _If you've annotated your code for tracing, you can open the trace of each row i ## Related -- [Run an evaluation asynchronously](../../how_to_guides/async) -- [Run an evaluation via the REST API](../../how_to_guides/run_evals_api_only) -- [Run an evaluation from the prompt playground](../../how_to_guides/run_evaluation_from_prompt_playground) +- [Run an evaluation asynchronously](./async) +- [Run an evaluation via the REST API](./run_evals_api_only) +- [Run an evaluation from the prompt playground](./run_evaluation_from_prompt_playground) diff --git a/docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx b/docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx index 4685864e..a22f5df1 100644 --- a/docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx +++ b/docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx @@ -391,4 +391,4 @@ The experiment will contain the results of the evaluation, including the scores ## Related -- [Evaluate a `langgraph` graph](../evaluation/langgraph) +- [Evaluate a `langgraph` graph](./langgraph) diff --git a/docs/evaluation/how_to_guides/evaluate_pairwise.mdx b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx index 55c2857f..2c4de88a 100644 --- a/docs/evaluation/how_to_guides/evaluate_pairwise.mdx +++ b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx @@ -13,14 +13,14 @@ import { :::info Key concepts -- [Pairwise evaluations](../../concepts#pairwise) +- [Pairwise evaluations](../concepts#pairwise) ::: LangSmith supports evaluating **existing** experiments in a comparative manner. This allows you to score the outputs from multiple experiments against each other, rather than being confined to evaluating outputs one at a time. Think [LMSYS Chatbot Arena](https://chat.lmsys.org/) - this is the same concept! -To do this, use the [evaluate_comparative](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate_comparative.html) / `evaluateComparative` function with two existing experiments. +To do this, use the [evaluate_comparative](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._runner.evaluate_comparative.html) / `evaluateComparative` function with two existing experiments. If you haven't already created experiments to compare, check out our [quick start](https://docs.smith.langchain.com/#5-run-your-first-evaluation) or oue [how-to guide](https://docs.smith.langchain.com/how_to_guides/evaluate_llm_application) to get started with evaluations. diff --git a/docs/evaluation/how_to_guides/langchain_runnable.mdx b/docs/evaluation/how_to_guides/langchain_runnable.mdx index 806a3e9f..faf2b216 100644 --- a/docs/evaluation/how_to_guides/langchain_runnable.mdx +++ b/docs/evaluation/how_to_guides/langchain_runnable.mdx @@ -136,4 +136,4 @@ The runnable is traced appropriately for each output. ## Related -- [How to evaluate a `langgraph` graph](../evaluation/langgraph) +- [How to evaluate a `langgraph` graph](./langgraph) diff --git a/docs/evaluation/how_to_guides/langgraph.mdx b/docs/evaluation/how_to_guides/langgraph.mdx index ef3373cf..557c2a6c 100644 --- a/docs/evaluation/how_to_guides/langgraph.mdx +++ b/docs/evaluation/how_to_guides/langgraph.mdx @@ -239,7 +239,7 @@ If we need access to information about intermediate steps that isn't in state, w :::tip Custom evaluators -See more about what arguments you can pass to custom evaluators in this [how-to guide](../evaluation/custom_evaluator). +See more about what arguments you can pass to custom evaluators in this [how-to guide](./custom_evaluator). ::: diff --git a/docs/evaluation/how_to_guides/llm_as_judge.mdx b/docs/evaluation/how_to_guides/llm_as_judge.mdx index b098bf18..b4d7ba8a 100644 --- a/docs/evaluation/how_to_guides/llm_as_judge.mdx +++ b/docs/evaluation/how_to_guides/llm_as_judge.mdx @@ -8,7 +8,7 @@ import { :::info Key concepts -- [LLM-as-a-judge evaluator](../../concepts#llm-as-judge) +- [LLM-as-a-judge evaluator](../concepts#llm-as-judge) ::: @@ -72,8 +72,8 @@ for the answer is logically valid and consistent with question and the answer.\\ ]} /> -See [here](../../how_to_guides/custom_evaluator) for more on how to write a custom evaluator. +See [here](./custom_evaluator) for more on how to write a custom evaluator. ## Prebuilt evaluator via `langchain` -See [here](../../how_to_guides/use_langchain_off_the_shelf_evaluators) for how to use prebuilt evaluators from `langchain`. +See [here](./use_langchain_off_the_shelf_evaluators) for how to use prebuilt evaluators from `langchain`. diff --git a/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx index 9beabd03..a71dd0f8 100644 --- a/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx +++ b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx @@ -7,7 +7,7 @@ sidebar_position: 1 :::tip Recommended Reading Before diving into this content, it might be helpful to read the following: -- [Concepts guide on evaluation and datasets](../../concepts#datasets-and-examples) +- [Concepts guide on evaluation and datasets](../concepts#datasets-and-examples) ::: @@ -36,14 +36,14 @@ Certain fields in your schema have a `+ Transformations` option. Transformations are preprocessing steps that, if enabled, update your examples when you add them to the dataset. For example the `convert to OpenAI messages` transformation will convert message-like objects, like LangChain messages, to OpenAI message format. -For the full list of available transformations, see [our reference](/reference/evaluation/dataset_transformations). +For the full list of available transformations, see [our reference](/referen./dataset_transformations). :::note If you plan to collect production traces in your dataset from LangChain [ChatModels](https://python.langchain.com/docs/concepts/chat_models/) or from OpenAI calls using the [LangSmith OpenAI wrapper](/observability/how_to_guides/tracing/annotate_code#wrap-the-openai-client), we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case. -Please see the [dataset transformations reference](/reference/evaluation/dataset_transformations) for more information. +Please see the [dataset transformations reference](/referen./dataset_transformations) for more information. ::: ## Add runs to your dataset in the UI diff --git a/docs/evaluation/how_to_guides/metric_type.mdx b/docs/evaluation/how_to_guides/metric_type.mdx index a59e4355..68610753 100644 --- a/docs/evaluation/how_to_guides/metric_type.mdx +++ b/docs/evaluation/how_to_guides/metric_type.mdx @@ -6,7 +6,7 @@ import { # How to return categorical vs numerical metrics -LangSmith supports both categorical and numerical metrics, and you can return either when writing a [custom evaluator](../../how_to_guides/custom_evaluator). +LangSmith supports both categorical and numerical metrics, and you can return either when writing a [custom evaluator](./custom_evaluator). For an evaluator result to be logged as a numerical metric, it must returned as: @@ -68,4 +68,4 @@ Here are some examples: ## Related -- [Return multiple metrics in one evaluator](../../how_to_guides/multiple_scores) +- [Return multiple metrics in one evaluator](./multiple_scores) diff --git a/docs/evaluation/how_to_guides/multiple_scores.mdx b/docs/evaluation/how_to_guides/multiple_scores.mdx index a69989a6..17f3fb9d 100644 --- a/docs/evaluation/how_to_guides/multiple_scores.mdx +++ b/docs/evaluation/how_to_guides/multiple_scores.mdx @@ -6,7 +6,7 @@ import { # How to return multiple scores in one evaluator -Sometimes it is useful for a [custom evaluator function](../../how_to_guides/custom_evaluator) or [summary evaluator function](../../how_to_guides/summary) to return multiple metrics. +Sometimes it is useful for a [custom evaluator function](./custom_evaluator) or [summary evaluator function](./summary) to return multiple metrics. For example, if you have multiple metrics being generated by an LLM judge, you can save time and money by making a single LLM call that generates multiple metrics instead of making multiple LLM calls. To return multiple scores using the Python SDK, simply return a list of dictionaries/objects of the following form: @@ -75,4 +75,4 @@ Rows from the resulting experiment will display each of the scores. ## Related -- [Return categorical vs numerical metrics](../../how_to_guides/metric_type) +- [Return categorical vs numerical metrics](./metric_type) diff --git a/docs/evaluation/how_to_guides/rate_limiting.mdx b/docs/evaluation/how_to_guides/rate_limiting.mdx index e5ea4318..82129264 100644 --- a/docs/evaluation/how_to_guides/rate_limiting.mdx +++ b/docs/evaluation/how_to_guides/rate_limiting.mdx @@ -76,7 +76,7 @@ See some examples of how to do this in the [OpenAI docs](https://platform.openai ## Limiting max_concurrency Limiting the number of concurrent calls you're making to your application and evaluators is another way to decrease the frequency of model calls you're making, and in that way avoid rate limit errors. -`max_concurrency` can be set directly on the [evaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate.html) / [aevaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html) functions. +`max_concurrency` can be set directly on the [evaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._runner.evaluate.html) / [aevaluate()](https://langsmith-sdk.readthedocs.io/en/late./langsmith.evaluation._arunner.aevaluate.html) functions.