fix

langchain-ai · Nov 23, 2024 · a6df8ea · a6df8ea
1 parent d8b3fc4
commit a6df8ea
Show file tree

Hide file tree

Showing 7 changed files with 14 additions and 7 deletions.
diff --git a/docs/evaluation/concepts/index.mdx b/docs/evaluation/concepts/index.mdx
@@ -225,7 +225,7 @@ LangSmith evaluations are kicked off using a single function, `evaluate`, which
 
 :::tip
 
-See documentation on using `evaluate` [here](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application#step-4-run-the-evaluation-and-view-the-results).
+See documentation on using `evaluate` [here](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application).
 
 :::
 
@@ -236,7 +236,7 @@ One of the most common questions when evaluating AI applications is: how can I b
 :::tip
 
 - See the [video on `Repetitions` in our LangSmith Evaluation series](https://youtu.be/Pvz24JdzzF8)
-- See our documentation on [`Repetitions`](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application#evaluate-on-a-dataset-with-repetitions)
+- See our documentation on [`Repetitions`](https://docs.smith.langchain.com/how_to_guides/evaluation/repetition)
 
 :::
 
@@ -434,7 +434,7 @@ Classification / Tagging applies a label to a given input (e.g., for toxicity de
 
 A central consideration for Classification / Tagging evaluation is whether you have a dataset with `reference` labels or not. If not, users frequently want to define an evaluator that uses criteria to apply label (e.g., toxicity, etc) to an input (e.g., text, user-question, etc). However, if ground truth class labels are provided, then the evaluation objective is focused on scoring a Classification / Tagging chain relative to the ground truth class label (e.g., using metrics such as precision, recall, etc).
 
-If ground truth reference labels are provided, then it's common to simply define a [custom heuristic evaluator](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application#use-custom-evaluators) to compare ground truth labels to the chain output. However, it is increasingly common given the emergence of LLMs simply use `LLM-as-judge` to perform the Classification / Tagging of an input based upon specified criteria (without a ground truth reference).
+If ground truth reference labels are provided, then it's common to simply define a [custom heuristic evaluator](https://docs.smith.langchain.com/how_to_guides/evaluation/custom_evaluator) to compare ground truth labels to the chain output. However, it is increasingly common given the emergence of LLMs simply use `LLM-as-judge` to perform the Classification / Tagging of an input based upon specified criteria (without a ground truth reference).
 
 `Online` or `Offline` evaluation is feasible when using `LLM-as-judge` with the `Reference-free` prompt used. In particular, this is well suited to `Online` evaluation when a user wants to tag / classify application input (e.g., for toxicity, etc).
 

diff --git a/docs/evaluation/how_to_guides/datasets/version_datasets.mdx b/docs/evaluation/how_to_guides/datasets/version_datasets.mdx
@@ -46,4 +46,4 @@ client.update_dataset_tag(
 )
 ```
 
-To run an evaluation on a particular tagged version of a dataset, you can follow [this guide](../evaluation/evaluate_llm_application#evaluate-on-a-particular-version-of-a-dataset).
+To run an evaluation on a particular tagged version of a dataset, you can follow [this guide](../evaluation/dataset_version).
diff --git a/docs/evaluation/how_to_guides/evaluation/check_evaluator.mdx b/docs/evaluation/how_to_guides/evaluation/check_evaluator.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/metric_type.mdx b/docs/evaluation/how_to_guides/evaluation/metric_type.mdx
@@ -65,3 +65,7 @@ Here are some examples:
 
 ]}
 />
+
+## Related
+
+- [Return multiple metrics in one evaluator](./how_to_guides/evaluation/multiple_scores)
diff --git a/docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx b/docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx
@@ -72,3 +72,7 @@ Example:
 Rows from the resulting experiment will display each of the scores.
 
 ![](../evaluation/static/multiple_scores.png)
+
+## Related
+
+- [Return categorical vs numerical metrics](./how_to_guides/evaluation/metric_type)
diff --git a/docs/evaluation/how_to_guides/index.md b/docs/evaluation/how_to_guides/index.md
@@ -31,7 +31,6 @@ Evaluate and improve your application before deploying it.
 - [Evaluate intermediate steps](./how_to_guides/evaluation/evaluate_on_intermediate_steps)
 - [Return multiple metrics in one evaluator](./how_to_guides/evaluation/multiple_scores)
 - [Return categorical vs numerical metrics](./how_to_guides/evaluation/metric_type)
-- [Check your evaluator setup](./how_to_guides/evaluation/check_evaluator)
 
 ### Configure the evaluation data
 

diff --git a/docs/evaluation/tutorials/agents.mdx b/docs/evaluation/tutorials/agents.mdx
@@ -460,7 +460,7 @@ See the full overview of single step evaluation in our [conceptual guide](https:
 
 :::
 
-We can check a specific tool call using [a custom evaluator](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application#use-custom-evaluators):
+We can check a specific tool call using [a custom evaluator](https://docs.smith.langchain.com/how_to_guides/evaluation/custom_evaluator):
 
 - Here, we just invoke the assistant, `assistant_runnable`, with a prompt and check if the resulting tool call is as expected.
 - Here, we are using a specialized agent where the tools are hard-coded (rather than passed with the dataset input).
@@ -507,7 +507,7 @@ experiment_results = evaluate(
 
 ### Trajectory
 
-We can check a trajectory of tool calls using [custom evaluators](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application#use-custom-evaluators):
+We can check a trajectory of tool calls using [custom evaluators](https://docs.smith.langchain.com/how_to_guides/evaluation/custom_evaluator):
 
 - Here, we just invoke the agent, `graph.invoke`, with a prompt.
 - Here, we are using a specialized agent where the tools are hard-coded (rather than passed with the dataset input).
-Original file line number
+Diff line change
@@ Expand Up / @@ -46,4 +46,4 @@ client.update_dataset_tag( @@
     )
     ```
-    To run an evaluation on a particular tagged version of a dataset, you can follow [this guide](../evaluation/evaluate_llm_application#evaluate-on-a-particular-version-of-a-dataset).
+    To run an evaluation on a particular tagged version of a dataset, you can follow [this guide](../evaluation/dataset_version).