diff --git a/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx b/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx
index 385408f8..558f52f2 100644
--- a/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx
+++ b/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx
@@ -144,7 +144,7 @@ If the header is not present, operations will default to the workspace the API k
 ## Security Settings
 
 :::note
-"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/prompts/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/tracing/share_trace), and [shared datasets](../../../evaluation/how_to_guides/datasets/share_dataset.mdx).
+"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/prompts/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/tracing/share_trace), and [shared datasets](../../../evaluation/how_to_guides/share_dataset.mdx).
 :::
 
 - <RegionalUrl
diff --git a/docs/evaluation/concepts/index.mdx b/docs/evaluation/concepts/index.mdx
index d068081c..9aac3138 100644
--- a/docs/evaluation/concepts/index.mdx
+++ b/docs/evaluation/concepts/index.mdx
@@ -66,7 +66,7 @@ When setting up your evaluation, you may want to partition your dataset into dif
 To learn more about creating dataset splits in LangSmith:
 
 - See our video on [`dataset splits`](https://youtu.be/FQMn_FQV-fI?feature=shared) in the LangSmith Evaluation series.
-- See our documentation [here](https://docs.smith.langchain.com/how_to_guides/datasets/manage_datasets_in_application#create-and-manage-dataset-splits).
+- See our documentation [here](./how_to_guides/manage_datasets_in_application#create-and-manage-dataset-splits).
 
 :::
 
@@ -105,7 +105,7 @@ Heuristic evaluators are hard-coded functions that perform computations to deter
 For some tasks, like code generation, custom heuristic evaluation (e.g., import and code execution-evaluation) are often extremely useful and superior to other evaluations (e.g., LLM-as-judge, discussed below).
 
 - Watch the [`Custom evaluator` video in our LangSmith Evaluation series](https://www.youtube.com/watch?v=w31v_kFvcNw) for a comprehensive overview.
-- Read our [documentation](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_on_intermediate_steps#3-define-your-custom-evaluators) on custom evaluators.
+- Read our [documentation](./how_to_guides/custom_evaluator) on custom evaluators.
 - See our [blog](https://blog.langchain.dev/code-execution-with-langgraph/) using custom evaluators for code generation.
 
 :::
@@ -124,7 +124,7 @@ With LLM-as-judge evaluators, it is important to carefully review the resulting
 
 :::tip
 
-See documentation on our workflow to audit and manually correct evaluator scores [here](https://docs.smith.langchain.com/how_to_guides/evaluation/audit_evaluator_scores).
+See documentation on our workflow to audit and manually correct evaluator scores [here](./how_to_guides/audit_evaluator_scores).
 
 :::
 
@@ -225,7 +225,7 @@ LangSmith evaluations are kicked off using a single function, `evaluate`, which
 
 :::tip
 
-See documentation on using `evaluate` [here](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application).
+See documentation on using `evaluate` [here](./how_to_guides/evaluate_llm_application).
 
 :::
 
@@ -236,7 +236,7 @@ One of the most common questions when evaluating AI applications is: how can I b
 :::tip
 
 - See the [video on `Repetitions` in our LangSmith Evaluation series](https://youtu.be/Pvz24JdzzF8)
-- See our documentation on [`Repetitions`](https://docs.smith.langchain.com/how_to_guides/evaluation/repetition)
+- See our documentation on [`Repetitions`](./how_to_guides/repetition)
 
 :::
 
@@ -281,7 +281,7 @@ However, there are several downsides to this type of evaluation. First, it usual
 
 :::tip
 
-See our tutorial on [evaluating agent response](https://docs.smith.langchain.com/tutorials/Developers/agents#response-evaluation).
+See our tutorial on [evaluating agent response](./tutorials/agents).
 
 :::
 
@@ -299,7 +299,7 @@ There are several benefits to this type of evaluation. It allows you to evaluate
 
 :::tip
 
-See our tutorial on [evaluating a single step of an agent](https://docs.smith.langchain.com/tutorials/Developers/agents#single-step-evaluation).
+See our tutorial on [evaluating a single step of an agent](./tutorials/agents#single-step-evaluation).
 
 :::
 
@@ -319,7 +319,7 @@ However, none of these approaches evaluate the input to the tools; they only foc
 
 :::tip
 
-See our tutorial on [evaluating agent trajectory](https://docs.smith.langchain.com/tutorials/Developers/agents#trajectory).
+See our tutorial on [evaluating agent trajectory](./tutorials/agents#trajectory).
 
 :::
 
@@ -434,7 +434,7 @@ Classification / Tagging applies a label to a given input (e.g., for toxicity de
 
 A central consideration for Classification / Tagging evaluation is whether you have a dataset with `reference` labels or not. If not, users frequently want to define an evaluator that uses criteria to apply label (e.g., toxicity, etc) to an input (e.g., text, user-question, etc). However, if ground truth class labels are provided, then the evaluation objective is focused on scoring a Classification / Tagging chain relative to the ground truth class label (e.g., using metrics such as precision, recall, etc).
 
-If ground truth reference labels are provided, then it's common to simply define a [custom heuristic evaluator](https://docs.smith.langchain.com/how_to_guides/evaluation/custom_evaluator) to compare ground truth labels to the chain output. However, it is increasingly common given the emergence of LLMs simply use `LLM-as-judge` to perform the Classification / Tagging of an input based upon specified criteria (without a ground truth reference).
+If ground truth reference labels are provided, then it's common to simply define a [custom heuristic evaluator](./how_to_guides/custom_evaluator) to compare ground truth labels to the chain output. However, it is increasingly common given the emergence of LLMs simply use `LLM-as-judge` to perform the Classification / Tagging of an input based upon specified criteria (without a ground truth reference).
 
 `Online` or `Offline` evaluation is feasible when using `LLM-as-judge` with the `Reference-free` prompt used. In particular, this is well suited to `Online` evaluation when a user wants to tag / classify application input (e.g., for toxicity, etc).
 
diff --git a/docs/evaluation/how_to_guides/human_feedback/annotate_traces_inline.mdx b/docs/evaluation/how_to_guides/annotate_traces_inline.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/annotate_traces_inline.mdx
rename to docs/evaluation/how_to_guides/annotate_traces_inline.mdx
diff --git a/docs/evaluation/how_to_guides/human_feedback/annotation_queues.mdx b/docs/evaluation/how_to_guides/annotation_queues.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/annotation_queues.mdx
rename to docs/evaluation/how_to_guides/annotation_queues.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/async.mdx b/docs/evaluation/how_to_guides/async.mdx
similarity index 84%
rename from docs/evaluation/how_to_guides/evaluation/async.mdx
rename to docs/evaluation/how_to_guides/async.mdx
index dfd7fc2e..facb4b81 100644
--- a/docs/evaluation/how_to_guides/evaluation/async.mdx
+++ b/docs/evaluation/how_to_guides/async.mdx
@@ -4,19 +4,19 @@ import { CodeTabs, python } from "@site/src/components/InstructionsWithCode";
 
 :::info Key concepts
 
-[Evaluations](../../concepts#applying-evaluations) | [Evaluators](../../concepts#evaluators) | [Datasets](../../concepts#datasets) | [Experiments](../../concepts#experiments)
+[Evaluations](../concepts#applying-evaluations) | [Evaluators](../concepts#evaluators) | [Datasets](../concepts#datasets) | [Experiments](../concepts#experiments)
 
 :::
 
 We can run evaluations asynchronously via the SDK using [aevaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html),
 which accepts all of the same arguments as [evaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate.html) but expects the application function to be asynchronous.
-You can learn more about how to use the `evaluate()` function [here](../../how_to_guides/evaluation/evaluate_llm_application).
+You can learn more about how to use the `evaluate()` function [here](./evaluate_llm_application).
 
 :::info Python only
 
 This guide is only relevant when using the Python SDK.
 In JS/TS the `evaluate()` function is already async.
-You can see how to use it [here](../../how_to_guides/evaluation/evaluate_llm_application).
+You can see how to use it [here](./evaluate_llm_application).
 
 :::
 
@@ -76,5 +76,5 @@ list 5 concrete questions that should be investigated to determine if the idea i
 
 ## Related
 
-- [Run an evaluation (synchronously)](../../how_to_guides/evaluation/evaluate_llm_application)
-- [Handle model rate limits](../../how_to_guides/evaluation/rate_limiting)
+- [Run an evaluation (synchronously)](./evaluate_llm_application)
+- [Handle model rate limits](./rate_limiting)
diff --git a/docs/evaluation/how_to_guides/human_feedback/attach_user_feedback.mdx b/docs/evaluation/how_to_guides/attach_user_feedback.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/attach_user_feedback.mdx
rename to docs/evaluation/how_to_guides/attach_user_feedback.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/audit_evaluator_scores.mdx b/docs/evaluation/how_to_guides/audit_evaluator_scores.mdx
similarity index 91%
rename from docs/evaluation/how_to_guides/evaluation/audit_evaluator_scores.mdx
rename to docs/evaluation/how_to_guides/audit_evaluator_scores.mdx
index 0aaf9268..2fd6ca32 100644
--- a/docs/evaluation/how_to_guides/evaluation/audit_evaluator_scores.mdx
+++ b/docs/evaluation/how_to_guides/audit_evaluator_scores.mdx
@@ -18,13 +18,13 @@ In the comparison view, you may click on any feedback tag to bring up the feedba
 If you would like, you may also attach an explanation to your correction. This is useful if you are using a [few-shot evaluator](./create_few_shot_evaluators) and will be automatically inserted into your few-shot examples
 in place of the `few_shot_explanation` prompt variable.
 
-![Audit Evaluator Comparison View](../evaluation/static/corrections_comparison_view.png)
+![Audit Evaluator Comparison View](./static/corrections_comparison_view.png)
 
 ## In the runs table
 
 In the runs table, find the "Feedback" column and click on the feedback tag to bring up the feedback details. Again, click the "edit" icon on the right to bring up the corrections view.
 
-![Audit Evaluator Runs Table](../evaluation/static/corrections_runs_table.png)
+![Audit Evaluator Runs Table](./static/corrections_runs_table.png)
 
 ## In the SDK
 
diff --git a/docs/evaluation/how_to_guides/evaluation/bind_evaluator_to_dataset.mdx b/docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx
similarity index 93%
rename from docs/evaluation/how_to_guides/evaluation/bind_evaluator_to_dataset.mdx
rename to docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx
index 88236b54..d289071c 100644
--- a/docs/evaluation/how_to_guides/evaluation/bind_evaluator_to_dataset.mdx
+++ b/docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx
@@ -23,7 +23,7 @@ The next steps vary based on the evaluator type.
 1. **Select the LLM as judge type evaluator**
 2. **Give your evaluator a name** and **set an inline prompt or load a prompt from the prompt hub** that will be used to evaluate the results of the runs in the experiment.
 
-![Add evaluator name and prompt](../evaluation/static/create_evaluator.png)
+![Add evaluator name and prompt](./static/create_evaluator.png)
 
 Importantly, evaluator prompts can only contain the following input variables:
 
@@ -42,11 +42,11 @@ LangSmith currently doesn't support setting up evaluators in the application tha
 
 You can specify the scoring criteria in the "schema" field. In this example, we are asking the LLM to grade on "correctness" of the output with respect to the reference, with a boolean output of 0 or 1. The name of the field in the schema will be interpreted as the feedback key and the type will be the type of the score.
 
-![Evaluator prompt](../evaluation/static/evaluator_prompt.png)
+![Evaluator prompt](./static/evaluator_prompt.png)
 
 3. **Save the evaluator** and navigate back to the dataset details page. Each **subsequent** experiment run from the dataset will now be evaluated by the evaluator you configured. Note that in the below image, each run in the experiment has a "correctness" score.
 
-![Playground evaluator results](../evaluation/static/playground_evaluator_results.png)
+![Playground evaluator results](./static/playground_evaluator_results.png)
 
 ## Custom code evaluators
 
@@ -70,7 +70,7 @@ You can specify the scoring criteria in the "schema" field. In this example, we
 
 In the UI, you will see a panel that lets you write your code inline, with some starter code:
 
-![](../evaluation/static/code-autoeval-popup.png)
+![](./static/code-autoeval-popup.png)
 
 Custom Code evaluators take in two arguments:
 
@@ -127,8 +127,8 @@ To visualize the feedback left on new experiments, try running a new experiment
 On the dataset, if you now click to the `experiments` tab -> `+ Experiment` -> `Run in Playground`, you can see the results in action.
 Your runs in your experiments will be automatically marked with the key specified in your code sample above (here, `formatted`):
 
-![](../evaluation/static/show-feedback-from-autoeval-code.png)
+![](./static/show-feedback-from-autoeval-code.png)
 
 And if you navigate back to your dataset, you'll see summary stats for said experiment in the `experiments` tab:
 
-![](../evaluation/static/experiments-tab-code-results.png)
+![](./static/experiments-tab-code-results.png)
diff --git a/docs/evaluation/how_to_guides/evaluation/compare_experiment_results.mdx b/docs/evaluation/how_to_guides/compare_experiment_results.mdx
similarity index 84%
rename from docs/evaluation/how_to_guides/evaluation/compare_experiment_results.mdx
rename to docs/evaluation/how_to_guides/compare_experiment_results.mdx
index 174b3f3d..cdec2e65 100644
--- a/docs/evaluation/how_to_guides/evaluation/compare_experiment_results.mdx
+++ b/docs/evaluation/how_to_guides/compare_experiment_results.mdx
@@ -8,13 +8,13 @@ Oftentimes, when you are iterating on your LLM application (such as changing the
 
 LangSmith supports a powerful comparison view that lets you hone in on key differences, regressions, and improvements between different experiments.
 
-![](../evaluation/static/regression_test.gif)
+![](./static/regression_test.gif)
 
 ## Open the comparison view
 
 To open the comparison view, select two or more experiments from the "Experiments" tab from a given dataset page. Then, click on the "Compare" button at the bottom of the page.
 
-![](../evaluation/static/open_comparison_view.png)
+![](./static/open_comparison_view.png)
 
 ## Toggle different views
 
@@ -22,46 +22,46 @@ You can toggle between different views by clicking on the "Display" dropdown at
 
 Toggling Full Text will show the full text of the input, output and reference output for each run. If the reference output is too long to display in the table, you can click on expand to view the full content.
 
-![](../evaluation/static/toggle_views.png)
+![](./static/toggle_views.png)
 
 ## View regressions and improvements
 
 In the LangSmith comparison view, runs that _regressed_ on your specified feedback key against your baseline experiment will be highlighted in red, while runs that _improved_
 will be highlighted in green. At the top of each column, you can see how many runs in that experiment did better and how many did worse than your baseline experiment.
 
-![Regressions](../evaluation/static/regression_view.png)
+![Regressions](./static/regression_view.png)
 
 ## Filter on regressions or improvements
 
 Click on the regressions or improvements buttons on the top of each column to filter to the runs that regressed or improved in that specific experiment.
 
-![Regressions Filter](../evaluation/static/filter_to_regressions.png)
+![Regressions Filter](./static/filter_to_regressions.png)
 
 ## Update baseline experiment
 
 In order to track regressions, you need a baseline experiment against which to compare. This will be automatically assigned as the first experiment in your comparison, but you can
 change it from the dropdown at the top of the page.
 
-![Baseline](../evaluation/static/select_baseline.png)
+![Baseline](./static/select_baseline.png)
 
 ## Select feedback key
 
 You will also want to select the feedback key (evaluation metric) on which you would like focus on. This can be selected via another dropdown at the top. Again, one will be assigned by
 default, but you can adjust as needed.
 
-![Feedback](../evaluation/static/select_feedback.png)
+![Feedback](./static/select_feedback.png)
 
 ## Open a trace
 
 If tracing is enabled for the evaluation run, you can click on the trace icon in the hover state of any experiment cell to open the trace view for that run. This will open up a trace in the side panel.
 
-![](../evaluation/static/open_trace_comparison.png)
+![](./static/open_trace_comparison.png)
 
 ## Expand detailed view
 
 From any cell, you can click on the expand icon in the hover state to open up a detailed view of all experiment results on that particular example input, along with feedback keys and scores.
 
-![](../evaluation/static/expanded_view.png)
+![](./static/expanded_view.png)
 
 ## Update display settings
 
@@ -69,4 +69,4 @@ You can adjust the display settings for comparison view by clicking on "Display"
 
 Here, you'll be able to toggle feedback, metrics, summary charts, and expand full text.
 
-![](../evaluation/static/update_display.png)
+![](./static/update_display.png)
diff --git a/docs/evaluation/how_to_guides/evaluation/create_few_shot_evaluators.mdx b/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx
similarity index 92%
rename from docs/evaluation/how_to_guides/evaluation/create_few_shot_evaluators.mdx
rename to docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx
index e50b3965..4bf8f696 100644
--- a/docs/evaluation/how_to_guides/evaluation/create_few_shot_evaluators.mdx
+++ b/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx
@@ -34,7 +34,7 @@ as your output key. For example, if your main prompt has variables `question` an
 You may also specify the number of few-shot examples to use. The default is 5. If your examples will tend to be very long, you may want to set this number lower to save tokens - whereas if your examples tend
 to be short, you can set a higher number in order to give your evaluator more examples to learn from. If you have more examples in your dataset than this number, we will randomly choose them for you.
 
-![Use corrections as few-shot examples](../evaluation/static/use_corrections_as_few_shot.png)
+![Use corrections as few-shot examples](./static/use_corrections_as_few_shot.png)
 
 Note that few-shot examples are not currently supported in evaluators that use Hub prompts.
 
@@ -51,7 +51,7 @@ begin seeing examples populated inside your corrections dataset. As you make cor
 The inputs to the few-shot examples will be the relevant fields from the inputs, outputs, and reference (if this an offline evaluator) of your chain/dataset.
 The outputs will be the corrected evaluator score and the explanations that you created when you left the corrections. Feel free to edit these to your liking. Here is an example of a few-shot example in a corrections dataset:
 
-![Few-shot example](../evaluation/static/few_shot_example.png)
+![Few-shot example](./static/few_shot_example.png)
 
 Note that the corrections may take a minute or two to be populated into your few-shot dataset. Once they are there, future runs of your evaluator will include them in the prompt!
 
@@ -59,12 +59,12 @@ Note that the corrections may take a minute or two to be populated into your few
 
 In order to view your corrections dataset, go to your rule and click "Edit Rule" (or "Edit Evaluator" from a dataset):
 
-![Edit Evaluator](../evaluation/static/edit_evaluator.png)
+![Edit Evaluator](./static/edit_evaluator.png)
 
 If this is an online evaluator (in a tracing project), you will need to click to edit your prompt:
 
-![Edit Prompt](../evaluation/static/click_to_edit_prompt.png)
+![Edit Prompt](./static/click_to_edit_prompt.png)
 
 From this screen, you will see a button that says "View few-shot dataset". Clicking this will bring you to your dataset of corrections, where you can view and update your few-shot examples:
 
-![View few-shot dataset](../evaluation/static/view_few_shot_ds.png)
+![View few-shot dataset](./static/view_few_shot_ds.png)
diff --git a/docs/evaluation/how_to_guides/evaluation/custom_evaluator.mdx b/docs/evaluation/how_to_guides/custom_evaluator.mdx
similarity index 92%
rename from docs/evaluation/how_to_guides/evaluation/custom_evaluator.mdx
rename to docs/evaluation/how_to_guides/custom_evaluator.mdx
index bce7b66d..93ac7ef0 100644
--- a/docs/evaluation/how_to_guides/evaluation/custom_evaluator.mdx
+++ b/docs/evaluation/how_to_guides/custom_evaluator.mdx
@@ -8,7 +8,7 @@ import {
 
 :::info Key concepts
 
-- [Evaluators](../../concepts#evaluators)
+- [Evaluators](../concepts#evaluators)
 
 :::
 
@@ -138,5 +138,5 @@ answer is logically valid and consistent with question and the answer."""
 
 ## Related
 
-- [Evaluate aggregate experiment results](../../how_to_guides/evaluation/summary): Define summary evaluators, which compute metrics for an entire experiment.
-- [Run an evaluation comparing two experiments](../../how_to_guides/evaluation/evaluate_pairwise): Define pairwise evaluators, which compute metrics by comparing two (or more) experiments against each other.
+- [Evaluate aggregate experiment results](./summary): Define summary evaluators, which compute metrics for an entire experiment.
+- [Run an evaluation comparing two experiments](./evaluate_pairwise): Define pairwise evaluators, which compute metrics by comparing two (or more) experiments against each other.
diff --git a/docs/evaluation/how_to_guides/evaluation/dataset_subset.mdx b/docs/evaluation/how_to_guides/dataset_subset.mdx
similarity index 85%
rename from docs/evaluation/how_to_guides/evaluation/dataset_subset.mdx
rename to docs/evaluation/how_to_guides/dataset_subset.mdx
index ca51c10e..efc914c9 100644
--- a/docs/evaluation/how_to_guides/evaluation/dataset_subset.mdx
+++ b/docs/evaluation/how_to_guides/dataset_subset.mdx
@@ -10,8 +10,8 @@ import {
 
 Before diving into this content, it might be helpful to read:
 
-- [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).
-- [guide on creating/managing dataset splits](../datasets/manage_datasets_in_application#create-and-manage-dataset-splits)
+- [guide on fetching examples](./manage_datasets_programmatically#fetch-examples).
+- [guide on creating/managing dataset splits](./manage_datasets_in_application#create-and-manage-dataset-splits)
 
 :::
 
@@ -49,7 +49,7 @@ One common workflow is to fetch examples that have a certain metadata key-value
   ]}
 />
 
-For more advanced filtering capabilities see this [how-to guide](../datasets/manage_datasets_programmatically#list-examples-by-structured-filter).
+For more advanced filtering capabilities see this [how-to guide](./manage_datasets_programmatically#list-examples-by-structured-filter).
 
 ## Evaluate on a dataset split
 
@@ -85,4 +85,4 @@ You can use the `list_examples` / `listExamples` method to evaluate on one or mu
 
 ## Related
 
-- More on [how to filter datasets](../datasets/manage_datasets_programmatically#list-examples-by-structured-filter)
+- More on [how to filter datasets](./manage_datasets_programmatically#list-examples-by-structured-filter)
diff --git a/docs/evaluation/how_to_guides/evaluation/dataset_version.mdx b/docs/evaluation/how_to_guides/dataset_version.mdx
similarity index 90%
rename from docs/evaluation/how_to_guides/evaluation/dataset_version.mdx
rename to docs/evaluation/how_to_guides/dataset_version.mdx
index e592bcad..564c1295 100644
--- a/docs/evaluation/how_to_guides/evaluation/dataset_version.mdx
+++ b/docs/evaluation/how_to_guides/dataset_version.mdx
@@ -8,8 +8,8 @@ import {
 
 :::tip Recommended reading
 
-Before diving into this content, it might be helpful to read the [guide on versioning datasets](../datasets/version_datasets).
-Additionally, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).
+Before diving into this content, it might be helpful to read the [guide on versioning datasets](./version_datasets).
+Additionally, it might be helpful to read the [guide on fetching examples](./manage_datasets_programmatically#fetch-examples).
 
 :::
 
diff --git a/docs/evaluation/how_to_guides/datasets/_category_.json b/docs/evaluation/how_to_guides/datasets/_category_.json
deleted file mode 100644
index 43379dda..00000000
--- a/docs/evaluation/how_to_guides/datasets/_category_.json
+++ /dev/null
@@ -1,5 +0,0 @@
-{
-  "label": "Datasets",
-  "collapsed": true,
-  "collapsible": true
-}
diff --git a/docs/evaluation/how_to_guides/evaluation/evaluate_existing_experiment.mdx b/docs/evaluation/how_to_guides/evaluate_existing_experiment.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/evaluate_existing_experiment.mdx
rename to docs/evaluation/how_to_guides/evaluate_existing_experiment.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx
similarity index 90%
rename from docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx
rename to docs/evaluation/how_to_guides/evaluate_llm_application.mdx
index fdefed61..929e04bc 100644
--- a/docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx
+++ b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx
@@ -12,7 +12,7 @@ import {
 
 :::info Key concepts
 
-[Evaluations](../../concepts#applying-evaluations) | [Evaluators](../../concepts#evaluators) | [Datasets](../../concepts#datasets)
+[Evaluations](../concepts#applying-evaluations) | [Evaluators](../concepts#evaluators) | [Datasets](../concepts#datasets)
 
 :::
 
@@ -22,7 +22,7 @@ In this guide we'll go over how to evaluate an application using the [evaluate()
 
 For larger evaluation jobs in Python we recommend using [aevaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html), the asynchronous version of `evaluate()`.
 It is still worthwhile to read this guide first, as the two have nearly identical interfaces,
-and then read the how-to guide on [running an evaluation asynchronously](../../how_to_guides/evaluation/async).
+and then read the how-to guide on [running an evaluation asynchronously](./async).
 
 :::
 
@@ -92,7 +92,7 @@ To understand how to annotate your code for tracing, please refer to [this guide
 
 ## Create or select a dataset
 
-We need a [Dataset](../../concepts#datasets) to evaluate our application on. Our dataset will contain labeled [examples](../../concepts#examples) of toxic and non-toxic text.
+We need a [Dataset](../concepts#datasets) to evaluate our application on. Our dataset will contain labeled [examples](../concepts#examples) of toxic and non-toxic text.
 
 <CodeTabs
   groupId="client-language"
@@ -150,11 +150,11 @@ We need a [Dataset](../../concepts#datasets) to evaluate our application on. Our
   ]}
 />
 
-See [here](../../how_to_guides#dataset-management) for more on dataset management.
+See [here](.#dataset-management) for more on dataset management.
 
 ## Define an evaluator
 
-[Evaluators](../../concepts#evaluators) are functions for scoring your application's outputs. They take in the example inputs, actual outputs, and, when present, the reference outputs.
+[Evaluators](../concepts#evaluators) are functions for scoring your application's outputs. They take in the example inputs, actual outputs, and, when present, the reference outputs.
 Since we have labels for this task, our evaluator can directly check if the actual outputs match the reference outputs.
 
 <CodeTabs
@@ -176,7 +176,7 @@ Since we have labels for this task, our evaluator can directly check if the actu
   ]}
 />
 
-See [here](../../how_to_guides#define-an-evaluator) for more on how to define evaluators.
+See [here](.#define-an-evaluator) for more on how to define evaluators.
 
 ## Run the evaluation
 
@@ -214,16 +214,16 @@ The key arguments are:
   ]}
 />
 
-See [here](../../how_to_guides#run-an-evaluation) for other ways to kick off evaluations and [here](../../how_to_guides#configure-an-evaluation-job) for how to configure evaluation jobs.
+See [here](.#run-an-evaluation) for other ways to kick off evaluations and [here](.#configure-an-evaluation-job) for how to configure evaluation jobs.
 
 ## Explore the results
 
-Each invocation of `evaluate()` creates an [Experiment](../../concepts#experiments) which can be viewed in the LangSmith UI or queried via the SDK.
+Each invocation of `evaluate()` creates an [Experiment](../concepts#experiments) which can be viewed in the LangSmith UI or queried via the SDK.
 Evaluation scores are stored against each actual output as feedback.
 
 _If you've annotated your code for tracing, you can open the trace of each row in a side panel view._
 
-![](../evaluation/static/view_experiment.gif)
+![](./static/view_experiment.gif)
 
 ## Reference code
 
@@ -364,6 +364,6 @@ _If you've annotated your code for tracing, you can open the trace of each row i
 
 ## Related
 
-- [Run an evaluation asynchronously](../../how_to_guides/evaluation/async)
-- [Run an evaluation via the REST API](../../how_to_guides/evaluation/run_evals_api_only)
-- [Run an evaluation from the prompt playground](../../how_to_guides/evaluation/run_evaluation_from_prompt_playground)
+- [Run an evaluation asynchronously](./async)
+- [Run an evaluation via the REST API](./run_evals_api_only)
+- [Run an evaluation from the prompt playground](./run_evaluation_from_prompt_playground)
diff --git a/docs/evaluation/how_to_guides/evaluation/evaluate_on_intermediate_steps.mdx b/docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx
similarity index 98%
rename from docs/evaluation/how_to_guides/evaluation/evaluate_on_intermediate_steps.mdx
rename to docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx
index 39e1041a..a22f5df1 100644
--- a/docs/evaluation/how_to_guides/evaluation/evaluate_on_intermediate_steps.mdx
+++ b/docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx
@@ -167,7 +167,7 @@ def rag_pipeline(question):
 />
 
 This pipeline will produce a trace that looks something like:
-![](../evaluation/static/evaluation_intermediate_trace.png)
+![](./static/evaluation_intermediate_trace.png)
 
 ## 2. Create a dataset and examples to evaluate the pipeline
 
@@ -387,8 +387,8 @@ Finally, we'll run `evaluate` with the custom evaluators defined above.
 />
 
 The experiment will contain the results of the evaluation, including the scores and comments from the evaluators:
-![](../evaluation/static/evaluation_intermediate_experiment.png)
+![](./static/evaluation_intermediate_experiment.png)
 
 ## Related
 
-- [Evaluate a `langgraph` graph](../evaluation/langgraph)
+- [Evaluate a `langgraph` graph](./langgraph)
diff --git a/docs/evaluation/how_to_guides/evaluation/evaluate_pairwise.mdx b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx
similarity index 96%
rename from docs/evaluation/how_to_guides/evaluation/evaluate_pairwise.mdx
rename to docs/evaluation/how_to_guides/evaluate_pairwise.mdx
index d68b48b7..f21ff146 100644
--- a/docs/evaluation/how_to_guides/evaluation/evaluate_pairwise.mdx
+++ b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx
@@ -13,7 +13,7 @@ import {
 
 :::info Key concepts
 
-- [Pairwise evaluations](../../concepts#pairwise)
+- [Pairwise evaluations](../concepts#pairwise)
 
 :::
 
@@ -22,7 +22,7 @@ This allows you to score the outputs from multiple experiments against each othe
 Think [LMSYS Chatbot Arena](https://chat.lmsys.org/) - this is the same concept!
 To do this, use the [evaluate_comparative](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate_comparative.html) / `evaluateComparative` function with two existing experiments.
 
-If you haven't already created experiments to compare, check out our [quick start](https://docs.smith.langchain.com/#5-run-your-first-evaluation) or oue [how-to guide](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application) to get started with evaluations.
+If you haven't already created experiments to compare, check out our [quick start](../) or our [how-to guide](./evaluate_llm_application) to get started with evaluations.
 
 ## `evaluate_comparative` args
 
@@ -240,12 +240,12 @@ In the Python example below, we are pulling [this structured prompt](https://smi
 
 Navigate to the "Pairwise Experiments" tab from the dataset page:
 
-![Pairwise Experiments Tab](../evaluation/static/pairwise_from_dataset.png)
+![Pairwise Experiments Tab](./static/pairwise_from_dataset.png)
 
 Click on a pairwise experiment that you would like to inspect, and you will be brought to the Comparison View:
 
-![Pairwise Comparison View](../evaluation/static/pairwise_comparison_view.png)
+![Pairwise Comparison View](./static/pairwise_comparison_view.png)
 
 You may filter to runs where the first experiment was better or vice versa by clicking the thumbs up/thumbs down buttons in the table header:
 
-![Pairwise Filtering](../evaluation/static/filter_pairwise.png)
+![Pairwise Filtering](./static/filter_pairwise.png)
diff --git a/docs/evaluation/how_to_guides/evaluation/_category_.json b/docs/evaluation/how_to_guides/evaluation/_category_.json
deleted file mode 100644
index b933b5ac..00000000
--- a/docs/evaluation/how_to_guides/evaluation/_category_.json
+++ /dev/null
@@ -1,5 +0,0 @@
-{
-  "label": "Evaluation",
-  "collapsed": true,
-  "collapsible": true
-}
diff --git a/docs/evaluation/how_to_guides/datasets/export_filtered_traces_to_dataset.mdx b/docs/evaluation/how_to_guides/export_filtered_traces_to_dataset.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/export_filtered_traces_to_dataset.mdx
rename to docs/evaluation/how_to_guides/export_filtered_traces_to_dataset.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/fetch_perf_metrics_experiment.mdx b/docs/evaluation/how_to_guides/fetch_perf_metrics_experiment.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/fetch_perf_metrics_experiment.mdx
rename to docs/evaluation/how_to_guides/fetch_perf_metrics_experiment.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/filter_experiments_ui.mdx b/docs/evaluation/how_to_guides/filter_experiments_ui.mdx
similarity index 94%
rename from docs/evaluation/how_to_guides/evaluation/filter_experiments_ui.mdx
rename to docs/evaluation/how_to_guides/filter_experiments_ui.mdx
index 6f32dfc2..eff983ef 100644
--- a/docs/evaluation/how_to_guides/evaluation/filter_experiments_ui.mdx
+++ b/docs/evaluation/how_to_guides/filter_experiments_ui.mdx
@@ -74,20 +74,20 @@ and a known ID of the prompt:
 
 In the UI, we see all experiments that have been run by default.
 
-![](../evaluation/static/filter-all-experiments.png)
+![](./static/filter-all-experiments.png)
 
 If we, say, have a preference for openai models, we can easily filter down and see scores within just openai
 models first:
 
-![](../evaluation/static/filter-openai.png)
+![](./static/filter-openai.png)
 
 We can stack filters, allowing us to filter out low scores on correctness to make sure we only compare
 relevant experiments:
 
-![](../evaluation/static/filter-feedback.png)
+![](./static/filter-feedback.png)
 
 Finally, we can clear and reset filters. For example, if we see there is clear there's a winner with the
 `singleminded` prompt, we can change filtering settings to see if any other model providers' models work
 as well with it:
 
-![](../evaluation/static/filter-singleminded.png)
+![](./static/filter-singleminded.png)
diff --git a/docs/evaluation/how_to_guides/human_feedback/_category_.json b/docs/evaluation/how_to_guides/human_feedback/_category_.json
deleted file mode 100644
index c98af163..00000000
--- a/docs/evaluation/how_to_guides/human_feedback/_category_.json
+++ /dev/null
@@ -1,5 +0,0 @@
-{
-  "label": "Human feedback",
-  "collapsed": true,
-  "collapsible": true
-}
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/annotate_trace_inline.png b/docs/evaluation/how_to_guides/human_feedback/static/annotate_trace_inline.png
deleted file mode 100644
index 7b45fbf0..00000000
Binary files a/docs/evaluation/how_to_guides/human_feedback/static/annotate_trace_inline.png and /dev/null differ
diff --git a/docs/evaluation/how_to_guides/index.md b/docs/evaluation/how_to_guides/index.md
index 7c3226d6..230692b2 100644
--- a/docs/evaluation/how_to_guides/index.md
+++ b/docs/evaluation/how_to_guides/index.md
@@ -12,82 +12,82 @@ Evaluate and improve your application before deploying it.
 
 ### Run an evaluation
 
-- [Run an evaluation](./how_to_guides/evaluation/evaluate_llm_application)
-- [Run an evaluation asynchronously](./how_to_guides/evaluation/async)
-- [Run an evaluation comparing two experiments](./how_to_guides/evaluation/evaluate_pairwise)
-- [Evaluate a `langchain` runnable](./how_to_guides/evaluation/langchain_runnable)
-- [Evaluate a `langgraph` graph](./how_to_guides/evaluation/langgraph)
-- [Run an evaluation of an existing experiment](./how_to_guides/evaluation/evaluate_existing_experiment)
-- [Run an evaluation via the REST API](./how_to_guides/evaluation/run_evals_api_only)
-- [Run an evaluation from the UI](./how_to_guides/evaluation/run_evaluation_from_prompt_playground)
+- [Run an evaluation](./how_to_guides/evaluate_llm_application)
+- [Run an evaluation asynchronously](./how_to_guides/async)
+- [Run an evaluation comparing two experiments](./how_to_guides/evaluate_pairwise)
+- [Evaluate a `langchain` runnable](./how_to_guides/langchain_runnable)
+- [Evaluate a `langgraph` graph](./how_to_guides/langgraph)
+- [Run an evaluation of an existing experiment](./how_to_guides/evaluate_existing_experiment)
+- [Run an evaluation via the REST API](./how_to_guides/run_evals_api_only)
+- [Run an evaluation from the UI](./how_to_guides/run_evaluation_from_prompt_playground)
 
 ### Define an evaluator
 
-- [Define a custom evaluator](./how_to_guides/evaluation/custom_evaluator)
-- [Define an LLM-as-a-judge evaluator](./how_to_guides/evaluation/llm_as_judge)
-- [Define a pairwise evaluator](./how_to_guides/evaluation/evaluate_pairwise)
-- [Define a summary evaluator](./how_to_guides/evaluation/summary)
-- [Use an off-the-shelf evaluator via the SDK (Python only)](./how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators)
-- [Evaluate intermediate steps](./how_to_guides/evaluation/evaluate_on_intermediate_steps)
-- [Return multiple metrics in one evaluator](./how_to_guides/evaluation/multiple_scores)
-- [Return categorical vs numerical metrics](./how_to_guides/evaluation/metric_type)
+- [Define a custom evaluator](./how_to_guides/custom_evaluator)
+- [Define an LLM-as-a-judge evaluator](./how_to_guides/llm_as_judge)
+- [Define a pairwise evaluator](./how_to_guides/evaluate_pairwise)
+- [Define a summary evaluator](./how_to_guides/summary)
+- [Use an off-the-shelf evaluator via the SDK (Python only)](./how_to_guides/use_langchain_off_the_shelf_evaluators)
+- [Evaluate intermediate steps](./how_to_guides/evaluate_on_intermediate_steps)
+- [Return multiple metrics in one evaluator](./how_to_guides/multiple_scores)
+- [Return categorical vs numerical metrics](./how_to_guides/metric_type)
 
 ### Configure the evaluation data
 
-- [Evaluate on a split / filtered view of a dataset](./how_to_guides/evaluation/dataset_subset)
-- [Evaluate on a specific dataset version](./how_to_guides/evaluation/dataset_version)
+- [Evaluate on a split / filtered view of a dataset](./how_to_guides/dataset_subset)
+- [Evaluate on a specific dataset version](./how_to_guides/dataset_version)
 
 ### Configure an evaluation job
 
-- [Evaluate with repetitions](./how_to_guides/evaluation/repetition)
-- [Handle model rate limits](./how_to_guides/evaluation/rate_limiting)
+- [Evaluate with repetitions](./how_to_guides/repetition)
+- [Handle model rate limits](./how_to_guides/rate_limiting)
 
 ## Unit testing
 
 Unit test your system to identify bugs and regressions.
 
-- [Unit test applications (Python only)](./how_to_guides/evaluation/unit_testing)
+- [Unit test applications (Python only)](./how_to_guides/unit_testing)
 
 ## Online evaluation
 
 Evaluate and monitor your system's live performance on production data.
 
 - [Set up an online evaluator](../../observability/how_to_guides/monitoring/online_evaluations)
-- [Create a few-shot evaluator](./how_to_guides/evaluation/create_few_shot_evaluators)
+- [Create a few-shot evaluator](./how_to_guides/create_few_shot_evaluators)
 
 ## Automatic evaluation
 
 Set up evaluators that automatically run for all experiments against a dataset.
 
-- [Set up an auto-evaluator](./how_to_guides/evaluation/bind_evaluator_to_dataset)
-- [Create a few-shot evaluator](./how_to_guides/evaluation/create_few_shot_evaluators)
+- [Set up an auto-evaluator](./how_to_guides/bind_evaluator_to_dataset)
+- [Create a few-shot evaluator](./how_to_guides/create_few_shot_evaluators)
 
 ## Analyzing experiment results
 
 Use the UI & API to understand your experiment results.
 
-- [Compare experiments with the comparison view](./how_to_guides/evaluation/compare_experiment_results)
-- [Filter experiments](./how_to_guides/evaluation/filter_experiments_ui)
-- [View pairwise experiments](./how_to_guides/evaluation/evaluate_pairwise#view-pairwise-experiments)
-- [Fetch experiment results in the SDK](./how_to_guides/evaluation/fetch_perf_metrics_experiment)
-- [Upload experiments run outside of LangSmith with the REST API](./how_to_guides/evaluation/upload_existing_experiments)
+- [Compare experiments with the comparison view](./how_to_guides/compare_experiment_results)
+- [Filter experiments](./how_to_guides/filter_experiments_ui)
+- [View pairwise experiments](./how_to_guides/evaluate_pairwise#view-pairwise-experiments)
+- [Fetch experiment results in the SDK](./how_to_guides/fetch_perf_metrics_experiment)
+- [Upload experiments run outside of LangSmith with the REST API](./how_to_guides/upload_existing_experiments)
 
 ## Dataset management
 
 Manage datasets in LangSmith used by your evaluations.
 
-- [Manage datasets from the UI](./how_to_guides/datasets/manage_datasets_in_application)
-- [Manage datasets programmatically](./how_to_guides/datasets/manage_datasets_programmatically)
-- [Version datasets](./how_to_guides/datasets/version_datasets)
-- [Share or unshare a dataset publicly](./how_to_guides/datasets/share_dataset)
-- [Export filtered traces from an experiment to a dataset](./how_to_guides/datasets/export_filtered_traces_to_dataset)
+- [Manage datasets from the UI](./how_to_guides/manage_datasets_in_application)
+- [Manage datasets programmatically](./how_to_guides/manage_datasets_programmatically)
+- [Version datasets](./how_to_guides/version_datasets)
+- [Share or unshare a dataset publicly](./how_to_guides/share_dataset)
+- [Export filtered traces from an experiment to a dataset](./how_to_guides/export_filtered_traces_to_dataset)
 
 ## Annotation queues and human feedback
 
 Collect feedback from subject matter experts and users to improve your applications.
 
-- [Use annotation queues](./how_to_guides/human_feedback/annotation_queues)
-- [Capture user feedback from your application to traces](./how_to_guides/human_feedback/attach_user_feedback)
-- [Set up a new feedback criteria](./how_to_guides/human_feedback/set_up_feedback_criteria)
-- [Annotate traces inline](./how_to_guides/human_feedback/annotate_traces_inline)
-- [Audit and correct evaluator scores](./how_to_guides/evaluation/audit_evaluator_scores)
+- [Use annotation queues](./how_to_guides/annotation_queues)
+- [Capture user feedback from your application to traces](./how_to_guides/attach_user_feedback)
+- [Set up a new feedback criteria](./how_to_guides/set_up_feedback_criteria)
+- [Annotate traces inline](./how_to_guides/annotate_traces_inline)
+- [Audit and correct evaluator scores](./how_to_guides/audit_evaluator_scores)
diff --git a/docs/evaluation/how_to_guides/datasets/index_datasets_for_dynamic_few_shot_example_selection.mdx b/docs/evaluation/how_to_guides/index_datasets_for_dynamic_few_shot_example_selection.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/index_datasets_for_dynamic_few_shot_example_selection.mdx
rename to docs/evaluation/how_to_guides/index_datasets_for_dynamic_few_shot_example_selection.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/langchain_runnable.mdx b/docs/evaluation/how_to_guides/langchain_runnable.mdx
similarity index 97%
rename from docs/evaluation/how_to_guides/evaluation/langchain_runnable.mdx
rename to docs/evaluation/how_to_guides/langchain_runnable.mdx
index 3993abfa..faf2b216 100644
--- a/docs/evaluation/how_to_guides/evaluation/langchain_runnable.mdx
+++ b/docs/evaluation/how_to_guides/langchain_runnable.mdx
@@ -132,8 +132,8 @@ To evaluate our chain we can pass it directly to the `evaluate()` / `aevaluate()
 
 The runnable is traced appropriately for each output.
 
-![](../evaluation/static/runnable_eval.png)
+![](./static/runnable_eval.png)
 
 ## Related
 
-- [How to evaluate a `langgraph` graph](../evaluation/langgraph)
+- [How to evaluate a `langgraph` graph](./langgraph)
diff --git a/docs/evaluation/how_to_guides/evaluation/langgraph.mdx b/docs/evaluation/how_to_guides/langgraph.mdx
similarity index 99%
rename from docs/evaluation/how_to_guides/evaluation/langgraph.mdx
rename to docs/evaluation/how_to_guides/langgraph.mdx
index ef3373cf..557c2a6c 100644
--- a/docs/evaluation/how_to_guides/evaluation/langgraph.mdx
+++ b/docs/evaluation/how_to_guides/langgraph.mdx
@@ -239,7 +239,7 @@ If we need access to information about intermediate steps that isn't in state, w
 
 :::tip Custom evaluators
 
-See more about what arguments you can pass to custom evaluators in this [how-to guide](../evaluation/custom_evaluator).
+See more about what arguments you can pass to custom evaluators in this [how-to guide](./custom_evaluator).
 
 :::
 
diff --git a/docs/evaluation/how_to_guides/evaluation/llm_as_judge.mdx b/docs/evaluation/how_to_guides/llm_as_judge.mdx
similarity index 89%
rename from docs/evaluation/how_to_guides/evaluation/llm_as_judge.mdx
rename to docs/evaluation/how_to_guides/llm_as_judge.mdx
index c8a0b8f7..b4d7ba8a 100644
--- a/docs/evaluation/how_to_guides/evaluation/llm_as_judge.mdx
+++ b/docs/evaluation/how_to_guides/llm_as_judge.mdx
@@ -8,7 +8,7 @@ import {
 
 :::info Key concepts
 
-- [LLM-as-a-judge evaluator](../../concepts#llm-as-judge)
+- [LLM-as-a-judge evaluator](../concepts#llm-as-judge)
 
 :::
 
@@ -72,8 +72,8 @@ for the answer is logically valid and consistent with question and the answer.\\
 ]}
 />
 
-See [here](../../how_to_guides/evaluation/custom_evaluator) for more on how to write a custom evaluator.
+See [here](./custom_evaluator) for more on how to write a custom evaluator.
 
 ## Prebuilt evaluator via `langchain`
 
-See [here](../../how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators) for how to use prebuilt evaluators from `langchain`.
+See [here](./use_langchain_off_the_shelf_evaluators) for how to use prebuilt evaluators from `langchain`.
diff --git a/docs/evaluation/how_to_guides/datasets/manage_datasets_in_application.mdx b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx
similarity index 99%
rename from docs/evaluation/how_to_guides/datasets/manage_datasets_in_application.mdx
rename to docs/evaluation/how_to_guides/manage_datasets_in_application.mdx
index 9beabd03..c99bf18a 100644
--- a/docs/evaluation/how_to_guides/datasets/manage_datasets_in_application.mdx
+++ b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx
@@ -7,7 +7,7 @@ sidebar_position: 1
 :::tip Recommended Reading
 Before diving into this content, it might be helpful to read the following:
 
-- [Concepts guide on evaluation and datasets](../../concepts#datasets-and-examples)
+- [Concepts guide on evaluation and datasets](../concepts#datasets-and-examples)
 
 :::
 
diff --git a/docs/evaluation/how_to_guides/datasets/manage_datasets_programmatically.mdx b/docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/manage_datasets_programmatically.mdx
rename to docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/metric_type.mdx b/docs/evaluation/how_to_guides/metric_type.mdx
similarity index 90%
rename from docs/evaluation/how_to_guides/evaluation/metric_type.mdx
rename to docs/evaluation/how_to_guides/metric_type.mdx
index a3aa401a..68610753 100644
--- a/docs/evaluation/how_to_guides/evaluation/metric_type.mdx
+++ b/docs/evaluation/how_to_guides/metric_type.mdx
@@ -6,7 +6,7 @@ import {
 
 # How to return categorical vs numerical metrics
 
-LangSmith supports both categorical and numerical metrics, and you can return either when writing a [custom evaluator](../../how_to_guides/evaluation/custom_evaluator).
+LangSmith supports both categorical and numerical metrics, and you can return either when writing a [custom evaluator](./custom_evaluator).
 
 For an evaluator result to be logged as a numerical metric, it must returned as:
 
@@ -68,4 +68,4 @@ Here are some examples:
 
 ## Related
 
-- [Return multiple metrics in one evaluator](../../how_to_guides/evaluation/multiple_scores)
+- [Return multiple metrics in one evaluator](./multiple_scores)
diff --git a/docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx b/docs/evaluation/how_to_guides/multiple_scores.mdx
similarity index 86%
rename from docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx
rename to docs/evaluation/how_to_guides/multiple_scores.mdx
index 2a433002..17f3fb9d 100644
--- a/docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx
+++ b/docs/evaluation/how_to_guides/multiple_scores.mdx
@@ -6,7 +6,7 @@ import {
 
 # How to return multiple scores in one evaluator
 
-Sometimes it is useful for a [custom evaluator function](../../how_to_guides/evaluation/custom_evaluator) or [summary evaluator function](../../how_to_guides/evaluation/summary) to return multiple metrics.
+Sometimes it is useful for a [custom evaluator function](./custom_evaluator) or [summary evaluator function](./summary) to return multiple metrics.
 For example, if you have multiple metrics being generated by an LLM judge, you can save time and money by making a single LLM call that generates multiple metrics instead of making multiple LLM calls.
 
 To return multiple scores using the Python SDK, simply return a list of dictionaries/objects of the following form:
@@ -71,8 +71,8 @@ Example:
 
 Rows from the resulting experiment will display each of the scores.
 
-![](../evaluation/static/multiple_scores.png)
+![](./static/multiple_scores.png)
 
 ## Related
 
-- [Return categorical vs numerical metrics](../../how_to_guides/evaluation/metric_type)
+- [Return categorical vs numerical metrics](./metric_type)
diff --git a/docs/evaluation/how_to_guides/evaluation/rate_limiting.mdx b/docs/evaluation/how_to_guides/rate_limiting.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/rate_limiting.mdx
rename to docs/evaluation/how_to_guides/rate_limiting.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/repetition.mdx b/docs/evaluation/how_to_guides/repetition.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/repetition.mdx
rename to docs/evaluation/how_to_guides/repetition.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/run_evals_api_only.mdx b/docs/evaluation/how_to_guides/run_evals_api_only.mdx
similarity index 98%
rename from docs/evaluation/how_to_guides/evaluation/run_evals_api_only.mdx
rename to docs/evaluation/how_to_guides/run_evals_api_only.mdx
index d50578f1..40fc5fbd 100644
--- a/docs/evaluation/how_to_guides/evaluation/run_evals_api_only.mdx
+++ b/docs/evaluation/how_to_guides/run_evals_api_only.mdx
@@ -26,7 +26,7 @@ This guide will show you how to run evals using the REST API, using the `request
 
 ## Create a dataset
 
-Here, we are using the python SDK for convenience. You can also use the API directly use the UI, see [this guide](../datasets/manage_datasets_in_application) for more information.
+Here, we are using the python SDK for convenience. You can also use the API directly use the UI, see [this guide](./manage_datasets_in_application) for more information.
 
 ```python
 import openai
@@ -191,7 +191,7 @@ for model_name in model_names:
 ## Run a pairwise experiment
 
 Next, we'll demonstrate how to run a pairwise experiment. In a pairwise experiment, you compare two examples against each other.
-For more information, check out [this guide](../evaluation/evaluate_pairwise).
+For more information, check out [this guide](./evaluate_pairwise).
 
 ```python
 #  A comparative experiment allows you to provide a preferential ranking on the outputs of two or more experiments
diff --git a/docs/evaluation/how_to_guides/evaluation/run_evaluation_from_prompt_playground.mdx b/docs/evaluation/how_to_guides/run_evaluation_from_prompt_playground.mdx
similarity index 91%
rename from docs/evaluation/how_to_guides/evaluation/run_evaluation_from_prompt_playground.mdx
rename to docs/evaluation/how_to_guides/run_evaluation_from_prompt_playground.mdx
index b2dee48b..726b2935 100644
--- a/docs/evaluation/how_to_guides/evaluation/run_evaluation_from_prompt_playground.mdx
+++ b/docs/evaluation/how_to_guides/run_evaluation_from_prompt_playground.mdx
@@ -12,12 +12,12 @@ This allows you to test your prompt / model configuration over a series of input
 
 1. **Navigate to the prompt playground** by clicking on "Prompts" in the sidebar, then selecting a prompt from the list of available prompts or creating a new one.
 2. **Select the "Switch to dataset" button** to switch to the dataset you want to use for the experiment. Please note that the dataset keys of the dataset inputs must match the input variables of the prompt. In the below sections, note that the selected dataset has inputs with keys "text", which correctly match the input variable of the prompt. Also note that there is a max capacity of 15 inputs for the prompt playground.
-   ![Switch to dataset](../evaluation/static/switch_to_dataset.png)
+   ![Switch to dataset](./static/switch_to_dataset.png)
 3. **Click on the "Start" button** or CMD+Enter to start the experiment. This will run the prompt over all the examples in the dataset and create an entry for the experiment in the dataset details page. Note that you need to commit the prompt to the prompt hub before you can start the experiment to ensure it can be referenced in the experiment. The result for each input will be streamed and displayed inline for each input in the dataset.
-   ![Input variables](../evaluation/static/input_variables_playground.png)
+   ![Input variables](./static/input_variables_playground.png)
 4. **View the results** by clicking on the "View Experiment" button at the bottom of the page. This will take you to the experiment details page where you can see the results of the experiment.
 5. **Navigate back to the commit page** by clicking on the "View Commit" button. This will take you back to the prompt page where you can make changes to the prompt and run more experiments. The "View Commit" button is available to all experiments that were run from the prompt playground. The experiment is prefixed with the prompt repository name, a unique identifier, and the date and time the experiment was run.
-   ![Playground experiment results](../evaluation/static/playground_experiment_results.png)
+   ![Playground experiment results](./static/playground_experiment_results.png)
 
 ## Add evaluation scores to the experiment
 
diff --git a/docs/evaluation/how_to_guides/human_feedback/set_up_feedback_criteria.mdx b/docs/evaluation/how_to_guides/set_up_feedback_criteria.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/set_up_feedback_criteria.mdx
rename to docs/evaluation/how_to_guides/set_up_feedback_criteria.mdx
diff --git a/docs/evaluation/how_to_guides/datasets/share_dataset.mdx b/docs/evaluation/how_to_guides/share_dataset.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/share_dataset.mdx
rename to docs/evaluation/how_to_guides/share_dataset.mdx
diff --git a/docs/evaluation/how_to_guides/evaluation/static/add-auto-evaluator-python.png b/docs/evaluation/how_to_guides/static/add-auto-evaluator-python.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/add-auto-evaluator-python.png
rename to docs/evaluation/how_to_guides/static/add-auto-evaluator-python.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/add-filtered-traces-to-dataset.png b/docs/evaluation/how_to_guides/static/add-filtered-traces-to-dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/add-filtered-traces-to-dataset.png
rename to docs/evaluation/how_to_guides/static/add-filtered-traces-to-dataset.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/add_manual_example.png b/docs/evaluation/how_to_guides/static/add_manual_example.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/add_manual_example.png
rename to docs/evaluation/how_to_guides/static/add_manual_example.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/add_metadata.png b/docs/evaluation/how_to_guides/static/add_metadata.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/add_metadata.png
rename to docs/evaluation/how_to_guides/static/add_metadata.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/add_to..dataset.png b/docs/evaluation/how_to_guides/static/add_to..dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/add_to..dataset.png
rename to docs/evaluation/how_to_guides/static/add_to..dataset.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/add_to_annotation_queue.png b/docs/evaluation/how_to_guides/static/add_to_annotation_queue.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/add_to_annotation_queue.png
rename to docs/evaluation/how_to_guides/static/add_to_annotation_queue.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/add_to_dataset.png b/docs/evaluation/how_to_guides/static/add_to_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/add_to_dataset.png
rename to docs/evaluation/how_to_guides/static/add_to_dataset.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/add_to_dataset_from_aq.png b/docs/evaluation/how_to_guides/static/add_to_dataset_from_aq.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/add_to_dataset_from_aq.png
rename to docs/evaluation/how_to_guides/static/add_to_dataset_from_aq.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/add_to_split2.png b/docs/evaluation/how_to_guides/static/add_to_split2.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/add_to_split2.png
rename to docs/evaluation/how_to_guides/static/add_to_split2.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/annotate_trace_inline.png b/docs/evaluation/how_to_guides/static/annotate_trace_inline.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/annotate_trace_inline.png
rename to docs/evaluation/how_to_guides/static/annotate_trace_inline.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/annotation_queue_edit.png b/docs/evaluation/how_to_guides/static/annotation_queue_edit.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/annotation_queue_edit.png
rename to docs/evaluation/how_to_guides/static/annotation_queue_edit.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/annotation_queue_form.png b/docs/evaluation/how_to_guides/static/annotation_queue_form.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/annotation_queue_form.png
rename to docs/evaluation/how_to_guides/static/annotation_queue_form.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/annotation_sidebar.png b/docs/evaluation/how_to_guides/static/annotation_sidebar.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/annotation_sidebar.png
rename to docs/evaluation/how_to_guides/static/annotation_sidebar.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/cat_feedback.png b/docs/evaluation/how_to_guides/static/cat_feedback.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/cat_feedback.png
rename to docs/evaluation/how_to_guides/static/cat_feedback.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/click_to_edit_prompt.png b/docs/evaluation/how_to_guides/static/click_to_edit_prompt.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/click_to_edit_prompt.png
rename to docs/evaluation/how_to_guides/static/click_to_edit_prompt.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/code-autoeval-popup.png b/docs/evaluation/how_to_guides/static/code-autoeval-popup.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/code-autoeval-popup.png
rename to docs/evaluation/how_to_guides/static/code-autoeval-popup.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/confirmation.png b/docs/evaluation/how_to_guides/static/confirmation.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/confirmation.png
rename to docs/evaluation/how_to_guides/static/confirmation.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/cont_feedback.png b/docs/evaluation/how_to_guides/static/cont_feedback.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/cont_feedback.png
rename to docs/evaluation/how_to_guides/static/cont_feedback.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/corrections_comparison_view.png b/docs/evaluation/how_to_guides/static/corrections_comparison_view.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/corrections_comparison_view.png
rename to docs/evaluation/how_to_guides/static/corrections_comparison_view.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/corrections_runs_table.png b/docs/evaluation/how_to_guides/static/corrections_runs_table.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/corrections_runs_table.png
rename to docs/evaluation/how_to_guides/static/corrections_runs_table.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/create_annotation_queue.png b/docs/evaluation/how_to_guides/static/create_annotation_queue.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/create_annotation_queue.png
rename to docs/evaluation/how_to_guides/static/create_annotation_queue.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/create_dataset_csv.png b/docs/evaluation/how_to_guides/static/create_dataset_csv.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/create_dataset_csv.png
rename to docs/evaluation/how_to_guides/static/create_dataset_csv.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/create_evaluator.png b/docs/evaluation/how_to_guides/static/create_evaluator.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/create_evaluator.png
rename to docs/evaluation/how_to_guides/static/create_evaluator.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/create_few_shot_evaluator.png b/docs/evaluation/how_to_guides/static/create_few_shot_evaluator.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/create_few_shot_evaluator.png
rename to docs/evaluation/how_to_guides/static/create_few_shot_evaluator.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/custom_json_schema.png b/docs/evaluation/how_to_guides/static/custom_json_schema.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/custom_json_schema.png
rename to docs/evaluation/how_to_guides/static/custom_json_schema.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/dataset_schema_definition.png b/docs/evaluation/how_to_guides/static/dataset_schema_definition.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/dataset_schema_definition.png
rename to docs/evaluation/how_to_guides/static/dataset_schema_definition.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/edit_evaluator.png b/docs/evaluation/how_to_guides/static/edit_evaluator.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/edit_evaluator.png
rename to docs/evaluation/how_to_guides/static/edit_evaluator.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/enter_dataset_details.png b/docs/evaluation/how_to_guides/static/enter_dataset_details.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/enter_dataset_details.png
rename to docs/evaluation/how_to_guides/static/enter_dataset_details.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/evaluation_intermediate_experiment.png b/docs/evaluation/how_to_guides/static/evaluation_intermediate_experiment.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/evaluation_intermediate_experiment.png
rename to docs/evaluation/how_to_guides/static/evaluation_intermediate_experiment.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/evaluation_intermediate_trace.png b/docs/evaluation/how_to_guides/static/evaluation_intermediate_trace.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/evaluation_intermediate_trace.png
rename to docs/evaluation/how_to_guides/static/evaluation_intermediate_trace.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/evaluator_prompt.png b/docs/evaluation/how_to_guides/static/evaluator_prompt.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/evaluator_prompt.png
rename to docs/evaluation/how_to_guides/static/evaluator_prompt.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/expanded_view.png b/docs/evaluation/how_to_guides/static/expanded_view.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/expanded_view.png
rename to docs/evaluation/how_to_guides/static/expanded_view.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/experiment-tracing-project.png b/docs/evaluation/how_to_guides/static/experiment-tracing-project.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/experiment-tracing-project.png
rename to docs/evaluation/how_to_guides/static/experiment-tracing-project.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/experiments-tab-code-results.png b/docs/evaluation/how_to_guides/static/experiments-tab-code-results.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/experiments-tab-code-results.png
rename to docs/evaluation/how_to_guides/static/experiments-tab-code-results.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/export-dataset-button.png b/docs/evaluation/how_to_guides/static/export-dataset-button.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/export-dataset-button.png
rename to docs/evaluation/how_to_guides/static/export-dataset-button.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/export-dataset-modal.png b/docs/evaluation/how_to_guides/static/export-dataset-modal.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/export-dataset-modal.png
rename to docs/evaluation/how_to_guides/static/export-dataset-modal.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/export-filtered-trace-to-dataset.png b/docs/evaluation/how_to_guides/static/export-filtered-trace-to-dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/export-filtered-trace-to-dataset.png
rename to docs/evaluation/how_to_guides/static/export-filtered-trace-to-dataset.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/few_shot_code_snippet.png b/docs/evaluation/how_to_guides/static/few_shot_code_snippet.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/few_shot_code_snippet.png
rename to docs/evaluation/how_to_guides/static/few_shot_code_snippet.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/few_shot_example.png b/docs/evaluation/how_to_guides/static/few_shot_example.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/few_shot_example.png
rename to docs/evaluation/how_to_guides/static/few_shot_example.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/few_shot_search_results.png b/docs/evaluation/how_to_guides/static/few_shot_search_results.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/few_shot_search_results.png
rename to docs/evaluation/how_to_guides/static/few_shot_search_results.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/few_shot_synced_empty_state.png b/docs/evaluation/how_to_guides/static/few_shot_synced_empty_state.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/few_shot_synced_empty_state.png
rename to docs/evaluation/how_to_guides/static/few_shot_synced_empty_state.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/few_shot_tab_unsynced.png b/docs/evaluation/how_to_guides/static/few_shot_tab_unsynced.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/few_shot_tab_unsynced.png
rename to docs/evaluation/how_to_guides/static/few_shot_tab_unsynced.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter-all-experiments.png b/docs/evaluation/how_to_guides/static/filter-all-experiments.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/filter-all-experiments.png
rename to docs/evaluation/how_to_guides/static/filter-all-experiments.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter-feedback.png b/docs/evaluation/how_to_guides/static/filter-feedback.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/filter-feedback.png
rename to docs/evaluation/how_to_guides/static/filter-feedback.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter-openai.png b/docs/evaluation/how_to_guides/static/filter-openai.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/filter-openai.png
rename to docs/evaluation/how_to_guides/static/filter-openai.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter-singleminded.png b/docs/evaluation/how_to_guides/static/filter-singleminded.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/filter-singleminded.png
rename to docs/evaluation/how_to_guides/static/filter-singleminded.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/filter_examples.png b/docs/evaluation/how_to_guides/static/filter_examples.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/filter_examples.png
rename to docs/evaluation/how_to_guides/static/filter_examples.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter_pairwise.png b/docs/evaluation/how_to_guides/static/filter_pairwise.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/filter_pairwise.png
rename to docs/evaluation/how_to_guides/static/filter_pairwise.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter_to_regressions.png b/docs/evaluation/how_to_guides/static/filter_to_regressions.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/filter_to_regressions.png
rename to docs/evaluation/how_to_guides/static/filter_to_regressions.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/filtered-traces-from-experiment.png b/docs/evaluation/how_to_guides/static/filtered-traces-from-experiment.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/filtered-traces-from-experiment.png
rename to docs/evaluation/how_to_guides/static/filtered-traces-from-experiment.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/filters_applied.png b/docs/evaluation/how_to_guides/static/filters_applied.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/filters_applied.png
rename to docs/evaluation/how_to_guides/static/filters_applied.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/generate_synthetic_examples_create.png b/docs/evaluation/how_to_guides/static/generate_synthetic_examples_create.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/generate_synthetic_examples_create.png
rename to docs/evaluation/how_to_guides/static/generate_synthetic_examples_create.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/generate_synthetic_examples_pane.png b/docs/evaluation/how_to_guides/static/generate_synthetic_examples_pane.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/generate_synthetic_examples_pane.png
rename to docs/evaluation/how_to_guides/static/generate_synthetic_examples_pane.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/input_variables_playground.png b/docs/evaluation/how_to_guides/static/input_variables_playground.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/input_variables_playground.png
rename to docs/evaluation/how_to_guides/static/input_variables_playground.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/modify_example.png b/docs/evaluation/how_to_guides/static/modify_example.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/modify_example.png
rename to docs/evaluation/how_to_guides/static/modify_example.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/multi_select_annotation_queue.png b/docs/evaluation/how_to_guides/static/multi_select_annotation_queue.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/multi_select_annotation_queue.png
rename to docs/evaluation/how_to_guides/static/multi_select_annotation_queue.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/multiple_scores.png b/docs/evaluation/how_to_guides/static/multiple_scores.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/multiple_scores.png
rename to docs/evaluation/how_to_guides/static/multiple_scores.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/multiselect_add_to_dataset.png b/docs/evaluation/how_to_guides/static/multiselect_add_to_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/multiselect_add_to_dataset.png
rename to docs/evaluation/how_to_guides/static/multiselect_add_to_dataset.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/new_dataset.png b/docs/evaluation/how_to_guides/static/new_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/new_dataset.png
rename to docs/evaluation/how_to_guides/static/new_dataset.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/open_comparison_view.png b/docs/evaluation/how_to_guides/static/open_comparison_view.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/open_comparison_view.png
rename to docs/evaluation/how_to_guides/static/open_comparison_view.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/open_trace_comparison.png b/docs/evaluation/how_to_guides/static/open_trace_comparison.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/open_trace_comparison.png
rename to docs/evaluation/how_to_guides/static/open_trace_comparison.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/pairwise_comparison_view.png b/docs/evaluation/how_to_guides/static/pairwise_comparison_view.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/pairwise_comparison_view.png
rename to docs/evaluation/how_to_guides/static/pairwise_comparison_view.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/pairwise_from_dataset.png b/docs/evaluation/how_to_guides/static/pairwise_from_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/pairwise_from_dataset.png
rename to docs/evaluation/how_to_guides/static/pairwise_from_dataset.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/playground_evaluator_results.png b/docs/evaluation/how_to_guides/static/playground_evaluator_results.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/playground_evaluator_results.png
rename to docs/evaluation/how_to_guides/static/playground_evaluator_results.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/playground_experiment_results.png b/docs/evaluation/how_to_guides/static/playground_experiment_results.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/playground_experiment_results.png
rename to docs/evaluation/how_to_guides/static/playground_experiment_results.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/regression_test.gif b/docs/evaluation/how_to_guides/static/regression_test.gif
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/regression_test.gif
rename to docs/evaluation/how_to_guides/static/regression_test.gif
diff --git a/docs/evaluation/how_to_guides/evaluation/static/regression_view.png b/docs/evaluation/how_to_guides/static/regression_view.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/regression_view.png
rename to docs/evaluation/how_to_guides/static/regression_view.png
diff --git a/docs/evaluation/how_to_guides/human_feedback/static/review_runs.png b/docs/evaluation/how_to_guides/static/review_runs.png
similarity index 100%
rename from docs/evaluation/how_to_guides/human_feedback/static/review_runs.png
rename to docs/evaluation/how_to_guides/static/review_runs.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/runnable_eval.png b/docs/evaluation/how_to_guides/static/runnable_eval.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/runnable_eval.png
rename to docs/evaluation/how_to_guides/static/runnable_eval.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/schema_validation.png b/docs/evaluation/how_to_guides/static/schema_validation.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/schema_validation.png
rename to docs/evaluation/how_to_guides/static/schema_validation.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/select_baseline.png b/docs/evaluation/how_to_guides/static/select_baseline.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/select_baseline.png
rename to docs/evaluation/how_to_guides/static/select_baseline.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/select_columns.png b/docs/evaluation/how_to_guides/static/select_columns.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/select_columns.png
rename to docs/evaluation/how_to_guides/static/select_columns.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/select_feedback.png b/docs/evaluation/how_to_guides/static/select_feedback.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/select_feedback.png
rename to docs/evaluation/how_to_guides/static/select_feedback.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/share_dataset.png b/docs/evaluation/how_to_guides/static/share_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/share_dataset.png
rename to docs/evaluation/how_to_guides/static/share_dataset.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/show-feedback-from-autoeval-code.png b/docs/evaluation/how_to_guides/static/show-feedback-from-autoeval-code.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/show-feedback-from-autoeval-code.png
rename to docs/evaluation/how_to_guides/static/show-feedback-from-autoeval-code.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/summary_eval.png b/docs/evaluation/how_to_guides/static/summary_eval.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/summary_eval.png
rename to docs/evaluation/how_to_guides/static/summary_eval.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/switch_to_dataset.png b/docs/evaluation/how_to_guides/static/switch_to_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/switch_to_dataset.png
rename to docs/evaluation/how_to_guides/static/switch_to_dataset.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/tag_this_version.png b/docs/evaluation/how_to_guides/static/tag_this_version.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/tag_this_version.png
rename to docs/evaluation/how_to_guides/static/tag_this_version.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/toggle_views.png b/docs/evaluation/how_to_guides/static/toggle_views.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/toggle_views.png
rename to docs/evaluation/how_to_guides/static/toggle_views.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/unit-test-suite.png b/docs/evaluation/how_to_guides/static/unit-test-suite.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/unit-test-suite.png
rename to docs/evaluation/how_to_guides/static/unit-test-suite.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/unshare_dataset.png b/docs/evaluation/how_to_guides/static/unshare_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/unshare_dataset.png
rename to docs/evaluation/how_to_guides/static/unshare_dataset.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/unshare_trace_list.png b/docs/evaluation/how_to_guides/static/unshare_trace_list.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/unshare_trace_list.png
rename to docs/evaluation/how_to_guides/static/unshare_trace_list.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/update_display.png b/docs/evaluation/how_to_guides/static/update_display.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/update_display.png
rename to docs/evaluation/how_to_guides/static/update_display.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/uploaded_dataset.png b/docs/evaluation/how_to_guides/static/uploaded_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/uploaded_dataset.png
rename to docs/evaluation/how_to_guides/static/uploaded_dataset.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/uploaded_dataset_examples.png b/docs/evaluation/how_to_guides/static/uploaded_dataset_examples.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/uploaded_dataset_examples.png
rename to docs/evaluation/how_to_guides/static/uploaded_dataset_examples.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/uploaded_experiment.png b/docs/evaluation/how_to_guides/static/uploaded_experiment.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/uploaded_experiment.png
rename to docs/evaluation/how_to_guides/static/uploaded_experiment.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/use_corrections_as_few_shot.png b/docs/evaluation/how_to_guides/static/use_corrections_as_few_shot.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/use_corrections_as_few_shot.png
rename to docs/evaluation/how_to_guides/static/use_corrections_as_few_shot.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/version_dataset.png b/docs/evaluation/how_to_guides/static/version_dataset.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/version_dataset.png
rename to docs/evaluation/how_to_guides/static/version_dataset.png
diff --git a/docs/evaluation/how_to_guides/datasets/static/version_dataset_tests.png b/docs/evaluation/how_to_guides/static/version_dataset_tests.png
similarity index 100%
rename from docs/evaluation/how_to_guides/datasets/static/version_dataset_tests.png
rename to docs/evaluation/how_to_guides/static/version_dataset_tests.png
diff --git a/docs/evaluation/how_to_guides/evaluation/static/view_experiment.gif b/docs/evaluation/how_to_guides/static/view_experiment.gif
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/view_experiment.gif
rename to docs/evaluation/how_to_guides/static/view_experiment.gif
diff --git a/docs/evaluation/how_to_guides/evaluation/static/view_few_shot_ds.png b/docs/evaluation/how_to_guides/static/view_few_shot_ds.png
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/static/view_few_shot_ds.png
rename to docs/evaluation/how_to_guides/static/view_few_shot_ds.png
diff --git a/docs/evaluation/how_to_guides/evaluation/summary.mdx b/docs/evaluation/how_to_guides/summary.mdx
similarity index 98%
rename from docs/evaluation/how_to_guides/evaluation/summary.mdx
rename to docs/evaluation/how_to_guides/summary.mdx
index 97fd68bf..761043eb 100644
--- a/docs/evaluation/how_to_guides/evaluation/summary.mdx
+++ b/docs/evaluation/how_to_guides/summary.mdx
@@ -73,4 +73,4 @@ You can then pass this evaluator to the `evaluate` method as follows:
 
 In the LangSmith UI, you'll the summary evaluator's score displayed with the corresponding key.
 
-![](../evaluation/static/summary_eval.png)
+![](./static/summary_eval.png)
diff --git a/docs/evaluation/how_to_guides/evaluation/unit_testing.mdx b/docs/evaluation/how_to_guides/unit_testing.mdx
similarity index 99%
rename from docs/evaluation/how_to_guides/evaluation/unit_testing.mdx
rename to docs/evaluation/how_to_guides/unit_testing.mdx
index b43eab1b..a6ce4b06 100644
--- a/docs/evaluation/how_to_guides/evaluation/unit_testing.mdx
+++ b/docs/evaluation/how_to_guides/unit_testing.mdx
@@ -57,7 +57,7 @@ Each time you run this test suite, LangSmith collects the pass/fail rate and oth
 
 The test suite syncs to a corresponding dataset named after your package or github repository.
 
-![Test Example](../evaluation/static/unit-test-suite.png)
+![Test Example](./static/unit-test-suite.png)
 
 ## Going further
 
diff --git a/docs/evaluation/how_to_guides/evaluation/upload_existing_experiments.mdx b/docs/evaluation/how_to_guides/upload_existing_experiments.mdx
similarity index 97%
rename from docs/evaluation/how_to_guides/evaluation/upload_existing_experiments.mdx
rename to docs/evaluation/how_to_guides/upload_existing_experiments.mdx
index c9c8551d..caa2901a 100644
--- a/docs/evaluation/how_to_guides/evaluation/upload_existing_experiments.mdx
+++ b/docs/evaluation/how_to_guides/upload_existing_experiments.mdx
@@ -260,12 +260,12 @@ information in the request body).
 ## View the experiment in the UI
 
 Now, login to the UI and click on your newly-created dataset! You should see a single experiment:
-![Uploaded experiments table](../evaluation/static/uploaded_dataset.png)
+![Uploaded experiments table](./static/uploaded_dataset.png)
 
 Your examples will have been uploaded:
-![Uploaded examples](../evaluation/static/uploaded_dataset_examples.png)
+![Uploaded examples](./static/uploaded_dataset_examples.png)
 
 Clicking on your experiment will bring you to the comparison view:
-![Uploaded experiment comparison view](../evaluation/static/uploaded_experiment.png)
+![Uploaded experiment comparison view](./static/uploaded_experiment.png)
 
 As you upload more experiments to your dataset, you will be able to compare the results and easily identify regressions in the comparison view.
diff --git a/docs/evaluation/how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators.mdx b/docs/evaluation/how_to_guides/use_langchain_off_the_shelf_evaluators.mdx
similarity index 100%
rename from docs/evaluation/how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators.mdx
rename to docs/evaluation/how_to_guides/use_langchain_off_the_shelf_evaluators.mdx
diff --git a/docs/evaluation/how_to_guides/datasets/version_datasets.mdx b/docs/evaluation/how_to_guides/version_datasets.mdx
similarity index 97%
rename from docs/evaluation/how_to_guides/datasets/version_datasets.mdx
rename to docs/evaluation/how_to_guides/version_datasets.mdx
index 0f15f123..df7e2418 100644
--- a/docs/evaluation/how_to_guides/datasets/version_datasets.mdx
+++ b/docs/evaluation/how_to_guides/version_datasets.mdx
@@ -46,4 +46,4 @@ client.update_dataset_tag(
 )
 ```
 
-To run an evaluation on a particular tagged version of a dataset, you can follow [this guide](../evaluation/dataset_version).
+To run an evaluation on a particular tagged version of a dataset, you can follow [this guide](./dataset_version).
diff --git a/docs/evaluation/index.mdx b/docs/evaluation/index.mdx
index a88c782f..f8d368a3 100644
--- a/docs/evaluation/index.mdx
+++ b/docs/evaluation/index.mdx
@@ -116,7 +116,7 @@ groupId="client-language"
 
 Click the link printed out by your evaluation run to access the LangSmith Experiments UI, and explore the results of your evaluation.
 
-![](./how_to_guides/evaluation/static/view_experiment.gif)
+![](./how_to_guides/static/view_experiment.gif)
 
 ## Next steps
 
diff --git a/docs/evaluation/tutorials/agents.mdx b/docs/evaluation/tutorials/agents.mdx
index 9efd0f73..c66aacdb 100644
--- a/docs/evaluation/tutorials/agents.mdx
+++ b/docs/evaluation/tutorials/agents.mdx
@@ -6,7 +6,7 @@ import { RegionalUrl } from "@site/src/components/RegionalUrls";
 
 # Evaluate an agent
 
-In this tutorial, we will walk through 3 evaluation strategies LLM agents, building on the conceptual points shared in our [evaluation guide](https://docs.smith.langchain.com/evaluation/concepts#agents).
+In this tutorial, we will walk through 3 evaluation strategies LLM agents, building on the conceptual points shared in our [evaluation guide](../concepts#agents).
 
 - `Final Response`: Evaluate the agent's final response.
 - `Single step`: Evaluate any agent step in isolation (e.g., whether it selects the appropriate tool).
@@ -348,7 +348,7 @@ Agent evaluation can focus on at least 3 things:
 
 :::tip
 
-See our [evaluation guide](https://docs.smith.langchain.com/evaluation/concepts#agents) for more details on Agent evaluation.
+See our [evaluation guide](../concepts#agents) for more details on Agent evaluation.
 
 :::
 
@@ -358,7 +358,7 @@ We can evaluate how well an agent does overall on a task. This basically involve
 
 :::tip
 
-See the full overview of agent response evaluation in our [conceptual guide](https://docs.smith.langchain.com/evaluation/concepts#evaluating-an-agents-final-response).
+See the full overview of agent response evaluation in our [conceptual guide](../concepts#evaluating-an-agents-final-response).
 
 :::
 
@@ -401,7 +401,7 @@ def predict_sql_agent_answer(example: dict):
 
 `Evaluator`
 
-This can [follow what we do for RAG](https://docs.smith.langchain.com/tutorials/Developers/rag) where we compare the generated answer with the reference answer.
+This can [follow what we do for RAG](./rag) where we compare the generated answer with the reference answer.
 
 ```python
 from langchain import hub
@@ -456,11 +456,11 @@ Agents generally make multiple actions. While it is useful to evaluate them end-
 
 :::tip
 
-See the full overview of single step evaluation in our [conceptual guide](https://docs.smith.langchain.com/evaluation/concepts#evaluating-a-single-step-of-an-agent).
+See the full overview of single step evaluation in our [conceptual guide](../concepts#evaluating-a-single-step-of-an-agent).
 
 :::
 
-We can check a specific tool call using [a custom evaluator](https://docs.smith.langchain.com/how_to_guides/evaluation/custom_evaluator):
+We can check a specific tool call using [a custom evaluator](../how_to_guides/custom_evaluator):
 
 - Here, we just invoke the assistant, `assistant_runnable`, with a prompt and check if the resulting tool call is as expected.
 - Here, we are using a specialized agent where the tools are hard-coded (rather than passed with the dataset input).
@@ -507,7 +507,7 @@ experiment_results = evaluate(
 
 ### Trajectory
 
-We can check a trajectory of tool calls using [custom evaluators](https://docs.smith.langchain.com/how_to_guides/evaluation/custom_evaluator):
+We can check a trajectory of tool calls using [custom evaluators](../how_to_guides/custom_evaluator):
 
 - Here, we just invoke the agent, `graph.invoke`, with a prompt.
 - Here, we are using a specialized agent where the tools are hard-coded (rather than passed with the dataset input).
@@ -519,7 +519,7 @@ We can check a trajectory of tool calls using [custom evaluators](https://docs.s
 
 :::tip
 
-See the full overview of single step evaluation in our [conceptual guide](https://docs.smith.langchain.com/evaluation/concepts#evaluating-an-agents-trajectory).
+See the full overview of single step evaluation in our [conceptual guide](../concepts#evaluating-an-agents-trajectory).
 
 :::
 
diff --git a/docs/evaluation/tutorials/rag.mdx b/docs/evaluation/tutorials/rag.mdx
index 3ff6eddf..7dec0ed5 100644
--- a/docs/evaluation/tutorials/rag.mdx
+++ b/docs/evaluation/tutorials/rag.mdx
@@ -406,7 +406,7 @@ However, we will show that this is not required.
 
 We can isolate them as intermediate chain steps.
 
-See detail on isolating intermediate chain steps [here](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_on_intermediate_steps).
+See detail on isolating intermediate chain steps [here](../how_to_guides/evaluate_on_intermediate_steps).
 
 Here is the a video from our LangSmith evaluation series for reference:
 
diff --git a/docs/evaluation/tutorials/swe-benchmark.mdx b/docs/evaluation/tutorials/swe-benchmark.mdx
index aa7ee4b0..c1f9b00b 100644
--- a/docs/evaluation/tutorials/swe-benchmark.mdx
+++ b/docs/evaluation/tutorials/swe-benchmark.mdx
@@ -72,7 +72,7 @@ dataset = client.upload_csv(
 
 ### Create dataset split for quicker testing
 
-Since running the SWE-bench evaluator takes a long time when run on all examples, you can create a "test" split for quickly testing the evaluator and your code. Read [this guide](../../evaluation/how_to_guides/datasets/manage_datasets_in_application#create-and-manage-dataset-splits) to learn more about managing dataset splits, or watch this short video that shows how to do it (to get to the starting page of the video, just click on your dataset created above and go to the `Examples` tab):
+Since running the SWE-bench evaluator takes a long time when run on all examples, you can create a "test" split for quickly testing the evaluator and your code. Read [this guide](../../evaluation/how_to_guides/manage_datasets_in_application#create-and-manage-dataset-splits) to learn more about managing dataset splits, or watch this short video that shows how to do it (to get to the starting page of the video, just click on your dataset created above and go to the `Examples` tab):
 
 import creating_split from "./static/creating_split.mp4";
 
diff --git a/docs/observability/concepts/index.mdx b/docs/observability/concepts/index.mdx
index b4acdff7..f007e0fc 100644
--- a/docs/observability/concepts/index.mdx
+++ b/docs/observability/concepts/index.mdx
@@ -50,9 +50,9 @@ Feedback can currently be continuous or discrete (categorical), and you can reus
 
 Collecting feedback on runs can be done in a number of ways:
 
-1. [Sent up along with a trace](/evaluation/how_to_guides/human_feedback/attach_user_feedback) from the LLM application
-2. Generated by a user in the app [inline](/evaluation/how_to_guides/human_feedback/annotate_traces_inline) or in an [annotation queue](../evaluation/how_to_guides/human_feedback/annotation_queues)
-3. Generated by an automatic evaluator during [offline evaluation](/evaluation/how_to_guides/evaluation/evaluate_llm_application)
+1. [Sent up along with a trace](/evaluation/how_to_guides/attach_user_feedback) from the LLM application
+2. Generated by a user in the app [inline](/evaluation/how_to_guides/annotate_traces_inline) or in an [annotation queue](../evaluation/how_to_guides/annotation_queues)
+3. Generated by an automatic evaluator during [offline evaluation](/evaluation/how_to_guides/evaluate_llm_application)
 4. Generated by an [online evaluator](./how_to_guides/monitoring/online_evaluations)
 
 To learn more about how feedback is stored in the application, see [this reference guide](../reference/data_formats/feedback_data_format).
diff --git a/docs/observability/how_to_guides/monitoring/rules.mdx b/docs/observability/how_to_guides/monitoring/rules.mdx
index bedaa787..898cdb37 100644
--- a/docs/observability/how_to_guides/monitoring/rules.mdx
+++ b/docs/observability/how_to_guides/monitoring/rules.mdx
@@ -31,7 +31,7 @@ _Alternatively_, you can access rules in settings by navigating to <RegionalUrl
 There are currently two types of rules you can create: **Project Rule** and **Dataset Rule**.
 
 - **Project Rule**: This rule will apply to traces in the specified project. Actions allowed are adding to a dataset, adding to an annotation queue, running online evaluation, and triggering a webhook.
-- **Dataset Rule**: This rule will apply to traces that are part of an experiment in the specified dataset. Actions allowed are only running an evaluator on the experiment results. To see this in action, you can follow [this guide](../../../evaluation/how_to_guides/evaluation/run_evaluation_from_prompt_playground).
+- **Dataset Rule**: This rule will apply to traces that are part of an experiment in the specified dataset. Actions allowed are only running an evaluator on the experiment results. To see this in action, you can follow [this guide](../../../evaluation/how_to_guides/run_evaluation_from_prompt_playground).
 
 :::
 
diff --git a/docs/prompt_engineering/how_to_guides/index.md b/docs/prompt_engineering/how_to_guides/index.md
index 251c568b..2b6361da 100644
--- a/docs/prompt_engineering/how_to_guides/index.md
+++ b/docs/prompt_engineering/how_to_guides/index.md
@@ -24,4 +24,4 @@ Quickly iterate on prompts and models in the LangSmith Playground.
 
 Use LangSmith datasets to serve few shot examples to your application
 
-- [Index a dataset for few shot example selection](../../evaluation/how_to_guides/datasets/index_datasets_for_dynamic_few_shot_example_selection)
+- [Index a dataset for few shot example selection](../../evaluation/how_to_guides/index_datasets_for_dynamic_few_shot_example_selection)
diff --git a/docs/reference/data_formats/example_data_format.mdx b/docs/reference/data_formats/example_data_format.mdx
index ef7ac1ef..639de5ff 100644
--- a/docs/reference/data_formats/example_data_format.mdx
+++ b/docs/reference/data_formats/example_data_format.mdx
@@ -25,4 +25,4 @@ LangSmith stores examples in datasets as follows:
 | **source_run_id** | UUID     | If this example was created from a LangSmith [`Run`](./run_data_format), the ID of said run |
 | **metadata**      | object   | A map of additional, user or SDK defined information that can be stored on an example.      |
 
-To learn more about how examples are used in evaluation, read our how-to guide on [evaluating LLM applications](/evaluation/how_to_guides/evaluation/evaluate_llm_application).
+To learn more about how examples are used in evaluation, read our how-to guide on [evaluating LLM applications](/evaluation/how_to_guides/evaluate_llm_application).
diff --git a/docs/reference/data_formats/feedback_data_format.mdx b/docs/reference/data_formats/feedback_data_format.mdx
index 7c1d9c96..480e25c6 100644
--- a/docs/reference/data_formats/feedback_data_format.mdx
+++ b/docs/reference/data_formats/feedback_data_format.mdx
@@ -14,9 +14,9 @@ Before diving into this content, it might be helpful to read the following:
 **Feedback** is LangSmith's way of storing the criteria and scores from evaluation on a particular trace or intermediate run (span).
 Feedback can be produced from a variety of ways, such as:
 
-1. [Sent up along with a trace](/evaluation/how_to_guides/human_feedback/attach_user_feedback) from the LLM application
-2. Generated by a user in the app [inline](/evaluation/how_to_guides/human_feedback/annotate_traces_inline) or in an [annotation queue](../../evaluation/how_to_guides/human_feedback/annotation_queues)
-3. Generated by an automatic evaluator during [offline evaluation](/evaluation/how_to_guides/evaluation/evaluate_llm_application)
+1. [Sent up along with a trace](/evaluation/how_to_guides/attach_user_feedback) from the LLM application
+2. Generated by a user in the app [inline](/evaluation/how_to_guides/annotate_traces_inline) or in an [annotation queue](../../evaluation/how_to_guides/annotation_queues)
+3. Generated by an automatic evaluator during [offline evaluation](/evaluation/how_to_guides/evaluate_llm_application)
 4. Generated by an [online evaluator](/observability/how_to_guides/monitoring/online_evaluations)
 
 Feedback is stored in a simple format with the following fields:
diff --git a/docs/reference/evaluation/dataset_transformations.mdx b/docs/reference/evaluation/dataset_transformations.mdx
index 69f5fd2d..b7945084 100644
--- a/docs/reference/evaluation/dataset_transformations.mdx
+++ b/docs/reference/evaluation/dataset_transformations.mdx
@@ -46,7 +46,7 @@ your schema directly and manually add the relevant transformations.
 When adding a run from a tracing project or annotation queue to a dataset, if it has the LLM run type, we will apply
 the Chat Model schema by default.
 
-For enablement on new datasets, see our [dataset management how-to guide](/evaluation/how_to_guides/datasets/manage_datasets_in_application).
+For enablement on new datasets, see our [dataset management how-to guide](/evaluation/how_to_guides/manage_datasets_in_application).
 
 ### Specs
 
diff --git a/docs/reference/sdk_reference/langchain_evaluators.mdx b/docs/reference/sdk_reference/langchain_evaluators.mdx
index 94182d1b..71f9f79c 100644
--- a/docs/reference/sdk_reference/langchain_evaluators.mdx
+++ b/docs/reference/sdk_reference/langchain_evaluators.mdx
@@ -1,7 +1,7 @@
 # LangChain off-the-shelf evaluators
 
 LangChain's evaluation module provides evaluators you can use as-is for common evaluation scenarios.
-To learn how to use these evaluators, please refer to the [following guide](../../../evaluation/how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators).
+To learn how to use these evaluators, please refer to the [following guide](../../../evaluation/how_to_guides/use_langchain_off_the_shelf_evaluators).
 
 :::note
 
diff --git a/vercel.json b/vercel.json
index 8ea82cb1..75bd5c29 100644
--- a/vercel.json
+++ b/vercel.json
@@ -189,6 +189,18 @@
     {
         "source": "/evaluation/how_to_guides/evaluation/evaluate_llm_application#evaluate-on-a-particular-version-of-a-dataset",
         "destination": "/evaluation/how_to_guides/evaluation/dataset_version"
+    },
+    {
+        "source": "/evaluation/how_to_guides/evaluation/:path*",
+        "destination": "/evaluation/how_to_guides/:path*"
+    },
+    {
+        "source": "/evaluation/how_to_guides/datasets/:path*",
+        "destination": "/evaluation/how_to_guides/:path*"
+    },
+    {
+        "source": "/evaluation/how_to_guides/human_feedback/:path*",
+        "destination": "/evaluation/how_to_guides/:path*"
     }
   ],
   "builds": [