diff --git a/docs/evaluation/concepts/index.mdx b/docs/evaluation/concepts/index.mdx
index fb4056ea..bb9b8704 100644
--- a/docs/evaluation/concepts/index.mdx
+++ b/docs/evaluation/concepts/index.mdx
@@ -66,7 +66,7 @@ When setting up your evaluation, you may want to partition your dataset into dif
 To learn more about creating dataset splits in LangSmith:
 
 - See our video on [`dataset splits`](https://youtu.be/FQMn_FQV-fI?feature=shared) in the LangSmith Evaluation series.
-- See our documentation [here](../how_to_guides/manage_datasets_in_application#create-and-manage-dataset-splits).
+- See our documentation [here](./how_to_guides/manage_datasets_in_application#create-and-manage-dataset-splits).
 
 :::
 
@@ -105,7 +105,7 @@ Heuristic evaluators are hard-coded functions that perform computations to deter
 For some tasks, like code generation, custom heuristic evaluation (e.g., import and code execution-evaluation) are often extremely useful and superior to other evaluations (e.g., LLM-as-judge, discussed below).
 
 - Watch the [`Custom evaluator` video in our LangSmith Evaluation series](https://www.youtube.com/watch?v=w31v_kFvcNw) for a comprehensive overview.
-- Read our [documentation](../how_to_guides/custom_evaluator) on custom evaluators.
+- Read our [documentation](./how_to_guides/custom_evaluator) on custom evaluators.
 - See our [blog](https://blog.langchain.dev/code-execution-with-langgraph/) using custom evaluators for code generation.
 
 :::
@@ -124,7 +124,7 @@ With LLM-as-judge evaluators, it is important to carefully review the resulting
 
 :::tip
 
-See documentation on our workflow to audit and manually correct evaluator scores [here](../how_to_guides/audit_evaluator_scores).
+See documentation on our workflow to audit and manually correct evaluator scores [here](./how_to_guides/audit_evaluator_scores).
 
 :::
 
@@ -225,7 +225,7 @@ LangSmith evaluations are kicked off using a single function, `evaluate`, which
 
 :::tip
 
-See documentation on using `evaluate` [here](../how_to_guides/evaluate_llm_application).
+See documentation on using `evaluate` [here](./how_to_guides/evaluate_llm_application).
 
 :::
 
@@ -236,7 +236,7 @@ One of the most common questions when evaluating AI applications is: how can I b
 :::tip
 
 - See the [video on `Repetitions` in our LangSmith Evaluation series](https://youtu.be/Pvz24JdzzF8)
-- See our documentation on [`Repetitions`](../how_to_guides/repetition)
+- See our documentation on [`Repetitions`](./how_to_guides/repetition)
 
 :::
 
@@ -252,7 +252,7 @@ Below, we will discuss evaluation of a few specific, popular LLM applications.
 
 ![Tool use](../concepts/static/tool_use.png)
 
-Below is a tool-calling agent in [LangGraph](https://langchain-ai.github.io/langgraph/tutorials/introduction/). The `assistant node` is an LLM that determines whether to invoke a tool based upon the input. The `tool condition` sees if a tool was selected by the `assistant node` and, if so, routes to the `tool node`. The `tool node` executes the tool and returns the output as a tool message to the `assistant node`. This loop continues until as long as the `assistant node` selects a tool. If no tool is selected, then the agent directly returns the LLM response.
+Below is a tool-calling agent in [LangGraph](https://langchain-ai.github.io/langgra./tutorials/introduction/). The `assistant node` is an LLM that determines whether to invoke a tool based upon the input. The `tool condition` sees if a tool was selected by the `assistant node` and, if so, routes to the `tool node`. The `tool node` executes the tool and returns the output as a tool message to the `assistant node`. This loop continues until as long as the `assistant node` selects a tool. If no tool is selected, then the agent directly returns the LLM response.
 
 ![Agent](../concepts/static/langgraph_agent.png)
 
@@ -281,7 +281,7 @@ However, there are several downsides to this type of evaluation. First, it usual
 
 :::tip
 
-See our tutorial on [evaluating agent response](../tutorials/agents).
+See our tutorial on [evaluating agent response](./tutorials/agents).
 
 :::
 
@@ -299,7 +299,7 @@ There are several benefits to this type of evaluation. It allows you to evaluate
 
 :::tip
 
-See our tutorial on [evaluating a single step of an agent](../tutorials/agents#single-step-evaluation).
+See our tutorial on [evaluating a single step of an agent](./tutorials/agents#single-step-evaluation).
 
 :::
 
@@ -319,7 +319,7 @@ However, none of these approaches evaluate the input to the tools; they only foc
 
 :::tip
 
-See our tutorial on [evaluating agent trajectory](../tutorials/agents#trajectory).
+See our tutorial on [evaluating agent trajectory](./tutorials/agents#trajectory).
 
 :::
 
@@ -434,7 +434,7 @@ Classification / Tagging applies a label to a given input (e.g., for toxicity de
 
 A central consideration for Classification / Tagging evaluation is whether you have a dataset with `reference` labels or not. If not, users frequently want to define an evaluator that uses criteria to apply label (e.g., toxicity, etc) to an input (e.g., text, user-question, etc). However, if ground truth class labels are provided, then the evaluation objective is focused on scoring a Classification / Tagging chain relative to the ground truth class label (e.g., using metrics such as precision, recall, etc).
 
-If ground truth reference labels are provided, then it's common to simply define a [custom heuristic evaluator](../how_to_guides/custom_evaluator) to compare ground truth labels to the chain output. However, it is increasingly common given the emergence of LLMs simply use `LLM-as-judge` to perform the Classification / Tagging of an input based upon specified criteria (without a ground truth reference).
+If ground truth reference labels are provided, then it's common to simply define a [custom heuristic evaluator](./how_to_guides/custom_evaluator) to compare ground truth labels to the chain output. However, it is increasingly common given the emergence of LLMs simply use `LLM-as-judge` to perform the Classification / Tagging of an input based upon specified criteria (without a ground truth reference).
 
 `Online` or `Offline` evaluation is feasible when using `LLM-as-judge` with the `Reference-free` prompt used. In particular, this is well suited to `Online` evaluation when a user wants to tag / classify application input (e.g., for toxicity, etc).
 
diff --git a/docs/evaluation/how_to_guides/dataset_subset.mdx b/docs/evaluation/how_to_guides/dataset_subset.mdx
index ca51c10e..efc914c9 100644
--- a/docs/evaluation/how_to_guides/dataset_subset.mdx
+++ b/docs/evaluation/how_to_guides/dataset_subset.mdx
@@ -10,8 +10,8 @@ import {
 
 Before diving into this content, it might be helpful to read:
 
-- [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).
-- [guide on creating/managing dataset splits](../datasets/manage_datasets_in_application#create-and-manage-dataset-splits)
+- [guide on fetching examples](./manage_datasets_programmatically#fetch-examples).
+- [guide on creating/managing dataset splits](./manage_datasets_in_application#create-and-manage-dataset-splits)
 
 :::
 
@@ -49,7 +49,7 @@ One common workflow is to fetch examples that have a certain metadata key-value
   ]}
 />
 
-For more advanced filtering capabilities see this [how-to guide](../datasets/manage_datasets_programmatically#list-examples-by-structured-filter).
+For more advanced filtering capabilities see this [how-to guide](./manage_datasets_programmatically#list-examples-by-structured-filter).
 
 ## Evaluate on a dataset split
 
@@ -85,4 +85,4 @@ You can use the `list_examples` / `listExamples` method to evaluate on one or mu
 
 ## Related
 
-- More on [how to filter datasets](../datasets/manage_datasets_programmatically#list-examples-by-structured-filter)
+- More on [how to filter datasets](./manage_datasets_programmatically#list-examples-by-structured-filter)
diff --git a/docs/evaluation/how_to_guides/dataset_version.mdx b/docs/evaluation/how_to_guides/dataset_version.mdx
index e592bcad..564c1295 100644
--- a/docs/evaluation/how_to_guides/dataset_version.mdx
+++ b/docs/evaluation/how_to_guides/dataset_version.mdx
@@ -8,8 +8,8 @@ import {
 
 :::tip Recommended reading
 
-Before diving into this content, it might be helpful to read the [guide on versioning datasets](../datasets/version_datasets).
-Additionally, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).
+Before diving into this content, it might be helpful to read the [guide on versioning datasets](./version_datasets).
+Additionally, it might be helpful to read the [guide on fetching examples](./manage_datasets_programmatically#fetch-examples).
 
 :::
 
diff --git a/docs/evaluation/how_to_guides/run_evals_api_only.mdx b/docs/evaluation/how_to_guides/run_evals_api_only.mdx
index 77c125cc..40fc5fbd 100644
--- a/docs/evaluation/how_to_guides/run_evals_api_only.mdx
+++ b/docs/evaluation/how_to_guides/run_evals_api_only.mdx
@@ -26,7 +26,7 @@ This guide will show you how to run evals using the REST API, using the `request
 
 ## Create a dataset
 
-Here, we are using the python SDK for convenience. You can also use the API directly use the UI, see [this guide](../datasets/manage_datasets_in_application) for more information.
+Here, we are using the python SDK for convenience. You can also use the API directly use the UI, see [this guide](./manage_datasets_in_application) for more information.
 
 ```python
 import openai