Skip to content

Commit

Permalink
fix links
Browse files Browse the repository at this point in the history
  • Loading branch information
baskaryan committed Nov 25, 2024
1 parent 57baca8 commit ce87297
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 17 deletions.
20 changes: 10 additions & 10 deletions docs/evaluation/concepts/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ When setting up your evaluation, you may want to partition your dataset into dif
To learn more about creating dataset splits in LangSmith:

- See our video on [`dataset splits`](https://youtu.be/FQMn_FQV-fI?feature=shared) in the LangSmith Evaluation series.
- See our documentation [here](../how_to_guides/manage_datasets_in_application#create-and-manage-dataset-splits).
- See our documentation [here](./how_to_guides/manage_datasets_in_application#create-and-manage-dataset-splits).

:::

Expand Down Expand Up @@ -105,7 +105,7 @@ Heuristic evaluators are hard-coded functions that perform computations to deter
For some tasks, like code generation, custom heuristic evaluation (e.g., import and code execution-evaluation) are often extremely useful and superior to other evaluations (e.g., LLM-as-judge, discussed below).

- Watch the [`Custom evaluator` video in our LangSmith Evaluation series](https://www.youtube.com/watch?v=w31v_kFvcNw) for a comprehensive overview.
- Read our [documentation](../how_to_guides/custom_evaluator) on custom evaluators.
- Read our [documentation](./how_to_guides/custom_evaluator) on custom evaluators.
- See our [blog](https://blog.langchain.dev/code-execution-with-langgraph/) using custom evaluators for code generation.

:::
Expand All @@ -124,7 +124,7 @@ With LLM-as-judge evaluators, it is important to carefully review the resulting

:::tip

See documentation on our workflow to audit and manually correct evaluator scores [here](../how_to_guides/audit_evaluator_scores).
See documentation on our workflow to audit and manually correct evaluator scores [here](./how_to_guides/audit_evaluator_scores).

:::

Expand Down Expand Up @@ -225,7 +225,7 @@ LangSmith evaluations are kicked off using a single function, `evaluate`, which

:::tip

See documentation on using `evaluate` [here](../how_to_guides/evaluate_llm_application).
See documentation on using `evaluate` [here](./how_to_guides/evaluate_llm_application).

:::

Expand All @@ -236,7 +236,7 @@ One of the most common questions when evaluating AI applications is: how can I b
:::tip

- See the [video on `Repetitions` in our LangSmith Evaluation series](https://youtu.be/Pvz24JdzzF8)
- See our documentation on [`Repetitions`](../how_to_guides/repetition)
- See our documentation on [`Repetitions`](./how_to_guides/repetition)

:::

Expand All @@ -252,7 +252,7 @@ Below, we will discuss evaluation of a few specific, popular LLM applications.

![Tool use](../concepts/static/tool_use.png)

Below is a tool-calling agent in [LangGraph](https://langchain-ai.github.io/langgraph/tutorials/introduction/). The `assistant node` is an LLM that determines whether to invoke a tool based upon the input. The `tool condition` sees if a tool was selected by the `assistant node` and, if so, routes to the `tool node`. The `tool node` executes the tool and returns the output as a tool message to the `assistant node`. This loop continues until as long as the `assistant node` selects a tool. If no tool is selected, then the agent directly returns the LLM response.
Below is a tool-calling agent in [LangGraph](https://langchain-ai.github.io/langgra./tutorials/introduction/). The `assistant node` is an LLM that determines whether to invoke a tool based upon the input. The `tool condition` sees if a tool was selected by the `assistant node` and, if so, routes to the `tool node`. The `tool node` executes the tool and returns the output as a tool message to the `assistant node`. This loop continues until as long as the `assistant node` selects a tool. If no tool is selected, then the agent directly returns the LLM response.

![Agent](../concepts/static/langgraph_agent.png)

Expand Down Expand Up @@ -281,7 +281,7 @@ However, there are several downsides to this type of evaluation. First, it usual

:::tip

See our tutorial on [evaluating agent response](../tutorials/agents).
See our tutorial on [evaluating agent response](./tutorials/agents).

:::

Expand All @@ -299,7 +299,7 @@ There are several benefits to this type of evaluation. It allows you to evaluate

:::tip

See our tutorial on [evaluating a single step of an agent](../tutorials/agents#single-step-evaluation).
See our tutorial on [evaluating a single step of an agent](./tutorials/agents#single-step-evaluation).

:::

Expand All @@ -319,7 +319,7 @@ However, none of these approaches evaluate the input to the tools; they only foc

:::tip

See our tutorial on [evaluating agent trajectory](../tutorials/agents#trajectory).
See our tutorial on [evaluating agent trajectory](./tutorials/agents#trajectory).

:::

Expand Down Expand Up @@ -434,7 +434,7 @@ Classification / Tagging applies a label to a given input (e.g., for toxicity de

A central consideration for Classification / Tagging evaluation is whether you have a dataset with `reference` labels or not. If not, users frequently want to define an evaluator that uses criteria to apply label (e.g., toxicity, etc) to an input (e.g., text, user-question, etc). However, if ground truth class labels are provided, then the evaluation objective is focused on scoring a Classification / Tagging chain relative to the ground truth class label (e.g., using metrics such as precision, recall, etc).

If ground truth reference labels are provided, then it's common to simply define a [custom heuristic evaluator](../how_to_guides/custom_evaluator) to compare ground truth labels to the chain output. However, it is increasingly common given the emergence of LLMs simply use `LLM-as-judge` to perform the Classification / Tagging of an input based upon specified criteria (without a ground truth reference).
If ground truth reference labels are provided, then it's common to simply define a [custom heuristic evaluator](./how_to_guides/custom_evaluator) to compare ground truth labels to the chain output. However, it is increasingly common given the emergence of LLMs simply use `LLM-as-judge` to perform the Classification / Tagging of an input based upon specified criteria (without a ground truth reference).

`Online` or `Offline` evaluation is feasible when using `LLM-as-judge` with the `Reference-free` prompt used. In particular, this is well suited to `Online` evaluation when a user wants to tag / classify application input (e.g., for toxicity, etc).

Expand Down
8 changes: 4 additions & 4 deletions docs/evaluation/how_to_guides/dataset_subset.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ import {

Before diving into this content, it might be helpful to read:

- [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).
- [guide on creating/managing dataset splits](../datasets/manage_datasets_in_application#create-and-manage-dataset-splits)
- [guide on fetching examples](./manage_datasets_programmatically#fetch-examples).
- [guide on creating/managing dataset splits](./manage_datasets_in_application#create-and-manage-dataset-splits)

:::

Expand Down Expand Up @@ -49,7 +49,7 @@ One common workflow is to fetch examples that have a certain metadata key-value
]}
/>

For more advanced filtering capabilities see this [how-to guide](../datasets/manage_datasets_programmatically#list-examples-by-structured-filter).
For more advanced filtering capabilities see this [how-to guide](./manage_datasets_programmatically#list-examples-by-structured-filter).

## Evaluate on a dataset split

Expand Down Expand Up @@ -85,4 +85,4 @@ You can use the `list_examples` / `listExamples` method to evaluate on one or mu

## Related

- More on [how to filter datasets](../datasets/manage_datasets_programmatically#list-examples-by-structured-filter)
- More on [how to filter datasets](./manage_datasets_programmatically#list-examples-by-structured-filter)
4 changes: 2 additions & 2 deletions docs/evaluation/how_to_guides/dataset_version.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ import {

:::tip Recommended reading

Before diving into this content, it might be helpful to read the [guide on versioning datasets](../datasets/version_datasets).
Additionally, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).
Before diving into this content, it might be helpful to read the [guide on versioning datasets](./version_datasets).
Additionally, it might be helpful to read the [guide on fetching examples](./manage_datasets_programmatically#fetch-examples).

:::

Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/how_to_guides/run_evals_api_only.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ This guide will show you how to run evals using the REST API, using the `request

## Create a dataset

Here, we are using the python SDK for convenience. You can also use the API directly use the UI, see [this guide](../datasets/manage_datasets_in_application) for more information.
Here, we are using the python SDK for convenience. You can also use the API directly use the UI, see [this guide](./manage_datasets_in_application) for more information.

```python
import openai
Expand Down

0 comments on commit ce87297

Please sign in to comment.