diff --git a/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx b/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx index 385408f8..558f52f2 100644 --- a/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx +++ b/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx @@ -144,7 +144,7 @@ If the header is not present, operations will default to the workspace the API k ## Security Settings :::note -"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/prompts/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/tracing/share_trace), and [shared datasets](../../../evaluation/how_to_guides/datasets/share_dataset.mdx). +"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/prompts/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/tracing/share_trace), and [shared datasets](../../../evaluation/how_to_guides/share_dataset.mdx). ::: - `+ Experiment` -> `Run in Playground`, you can see the results in action. Your runs in your experiments will be automatically marked with the key specified in your code sample above (here, `formatted`): -![](../evaluation/static/show-feedback-from-autoeval-code.png) +![](./static/show-feedback-from-autoeval-code.png) And if you navigate back to your dataset, you'll see summary stats for said experiment in the `experiments` tab: -![](../evaluation/static/experiments-tab-code-results.png) +![](./static/experiments-tab-code-results.png) diff --git a/docs/evaluation/how_to_guides/evaluation/compare_experiment_results.mdx b/docs/evaluation/how_to_guides/compare_experiment_results.mdx similarity index 84% rename from docs/evaluation/how_to_guides/evaluation/compare_experiment_results.mdx rename to docs/evaluation/how_to_guides/compare_experiment_results.mdx index 174b3f3d..cdec2e65 100644 --- a/docs/evaluation/how_to_guides/evaluation/compare_experiment_results.mdx +++ b/docs/evaluation/how_to_guides/compare_experiment_results.mdx @@ -8,13 +8,13 @@ Oftentimes, when you are iterating on your LLM application (such as changing the LangSmith supports a powerful comparison view that lets you hone in on key differences, regressions, and improvements between different experiments. -![](../evaluation/static/regression_test.gif) +![](./static/regression_test.gif) ## Open the comparison view To open the comparison view, select two or more experiments from the "Experiments" tab from a given dataset page. Then, click on the "Compare" button at the bottom of the page. -![](../evaluation/static/open_comparison_view.png) +![](./static/open_comparison_view.png) ## Toggle different views @@ -22,46 +22,46 @@ You can toggle between different views by clicking on the "Display" dropdown at Toggling Full Text will show the full text of the input, output and reference output for each run. If the reference output is too long to display in the table, you can click on expand to view the full content. -![](../evaluation/static/toggle_views.png) +![](./static/toggle_views.png) ## View regressions and improvements In the LangSmith comparison view, runs that _regressed_ on your specified feedback key against your baseline experiment will be highlighted in red, while runs that _improved_ will be highlighted in green. At the top of each column, you can see how many runs in that experiment did better and how many did worse than your baseline experiment. -![Regressions](../evaluation/static/regression_view.png) +![Regressions](./static/regression_view.png) ## Filter on regressions or improvements Click on the regressions or improvements buttons on the top of each column to filter to the runs that regressed or improved in that specific experiment. -![Regressions Filter](../evaluation/static/filter_to_regressions.png) +![Regressions Filter](./static/filter_to_regressions.png) ## Update baseline experiment In order to track regressions, you need a baseline experiment against which to compare. This will be automatically assigned as the first experiment in your comparison, but you can change it from the dropdown at the top of the page. -![Baseline](../evaluation/static/select_baseline.png) +![Baseline](./static/select_baseline.png) ## Select feedback key You will also want to select the feedback key (evaluation metric) on which you would like focus on. This can be selected via another dropdown at the top. Again, one will be assigned by default, but you can adjust as needed. -![Feedback](../evaluation/static/select_feedback.png) +![Feedback](./static/select_feedback.png) ## Open a trace If tracing is enabled for the evaluation run, you can click on the trace icon in the hover state of any experiment cell to open the trace view for that run. This will open up a trace in the side panel. -![](../evaluation/static/open_trace_comparison.png) +![](./static/open_trace_comparison.png) ## Expand detailed view From any cell, you can click on the expand icon in the hover state to open up a detailed view of all experiment results on that particular example input, along with feedback keys and scores. -![](../evaluation/static/expanded_view.png) +![](./static/expanded_view.png) ## Update display settings @@ -69,4 +69,4 @@ You can adjust the display settings for comparison view by clicking on "Display" Here, you'll be able to toggle feedback, metrics, summary charts, and expand full text. -![](../evaluation/static/update_display.png) +![](./static/update_display.png) diff --git a/docs/evaluation/how_to_guides/evaluation/create_few_shot_evaluators.mdx b/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx similarity index 92% rename from docs/evaluation/how_to_guides/evaluation/create_few_shot_evaluators.mdx rename to docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx index e50b3965..4bf8f696 100644 --- a/docs/evaluation/how_to_guides/evaluation/create_few_shot_evaluators.mdx +++ b/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx @@ -34,7 +34,7 @@ as your output key. For example, if your main prompt has variables `question` an You may also specify the number of few-shot examples to use. The default is 5. If your examples will tend to be very long, you may want to set this number lower to save tokens - whereas if your examples tend to be short, you can set a higher number in order to give your evaluator more examples to learn from. If you have more examples in your dataset than this number, we will randomly choose them for you. -![Use corrections as few-shot examples](../evaluation/static/use_corrections_as_few_shot.png) +![Use corrections as few-shot examples](./static/use_corrections_as_few_shot.png) Note that few-shot examples are not currently supported in evaluators that use Hub prompts. @@ -51,7 +51,7 @@ begin seeing examples populated inside your corrections dataset. As you make cor The inputs to the few-shot examples will be the relevant fields from the inputs, outputs, and reference (if this an offline evaluator) of your chain/dataset. The outputs will be the corrected evaluator score and the explanations that you created when you left the corrections. Feel free to edit these to your liking. Here is an example of a few-shot example in a corrections dataset: -![Few-shot example](../evaluation/static/few_shot_example.png) +![Few-shot example](./static/few_shot_example.png) Note that the corrections may take a minute or two to be populated into your few-shot dataset. Once they are there, future runs of your evaluator will include them in the prompt! @@ -59,12 +59,12 @@ Note that the corrections may take a minute or two to be populated into your few In order to view your corrections dataset, go to your rule and click "Edit Rule" (or "Edit Evaluator" from a dataset): -![Edit Evaluator](../evaluation/static/edit_evaluator.png) +![Edit Evaluator](./static/edit_evaluator.png) If this is an online evaluator (in a tracing project), you will need to click to edit your prompt: -![Edit Prompt](../evaluation/static/click_to_edit_prompt.png) +![Edit Prompt](./static/click_to_edit_prompt.png) From this screen, you will see a button that says "View few-shot dataset". Clicking this will bring you to your dataset of corrections, where you can view and update your few-shot examples: -![View few-shot dataset](../evaluation/static/view_few_shot_ds.png) +![View few-shot dataset](./static/view_few_shot_ds.png) diff --git a/docs/evaluation/how_to_guides/evaluation/custom_evaluator.mdx b/docs/evaluation/how_to_guides/custom_evaluator.mdx similarity index 92% rename from docs/evaluation/how_to_guides/evaluation/custom_evaluator.mdx rename to docs/evaluation/how_to_guides/custom_evaluator.mdx index bce7b66d..93ac7ef0 100644 --- a/docs/evaluation/how_to_guides/evaluation/custom_evaluator.mdx +++ b/docs/evaluation/how_to_guides/custom_evaluator.mdx @@ -8,7 +8,7 @@ import { :::info Key concepts -- [Evaluators](../../concepts#evaluators) +- [Evaluators](../concepts#evaluators) ::: @@ -138,5 +138,5 @@ answer is logically valid and consistent with question and the answer.""" ## Related -- [Evaluate aggregate experiment results](../../how_to_guides/evaluation/summary): Define summary evaluators, which compute metrics for an entire experiment. -- [Run an evaluation comparing two experiments](../../how_to_guides/evaluation/evaluate_pairwise): Define pairwise evaluators, which compute metrics by comparing two (or more) experiments against each other. +- [Evaluate aggregate experiment results](./summary): Define summary evaluators, which compute metrics for an entire experiment. +- [Run an evaluation comparing two experiments](./evaluate_pairwise): Define pairwise evaluators, which compute metrics by comparing two (or more) experiments against each other. diff --git a/docs/evaluation/how_to_guides/evaluation/dataset_subset.mdx b/docs/evaluation/how_to_guides/dataset_subset.mdx similarity index 85% rename from docs/evaluation/how_to_guides/evaluation/dataset_subset.mdx rename to docs/evaluation/how_to_guides/dataset_subset.mdx index ca51c10e..efc914c9 100644 --- a/docs/evaluation/how_to_guides/evaluation/dataset_subset.mdx +++ b/docs/evaluation/how_to_guides/dataset_subset.mdx @@ -10,8 +10,8 @@ import { Before diving into this content, it might be helpful to read: -- [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples). -- [guide on creating/managing dataset splits](../datasets/manage_datasets_in_application#create-and-manage-dataset-splits) +- [guide on fetching examples](./manage_datasets_programmatically#fetch-examples). +- [guide on creating/managing dataset splits](./manage_datasets_in_application#create-and-manage-dataset-splits) ::: @@ -49,7 +49,7 @@ One common workflow is to fetch examples that have a certain metadata key-value ]} /> -For more advanced filtering capabilities see this [how-to guide](../datasets/manage_datasets_programmatically#list-examples-by-structured-filter). +For more advanced filtering capabilities see this [how-to guide](./manage_datasets_programmatically#list-examples-by-structured-filter). ## Evaluate on a dataset split @@ -85,4 +85,4 @@ You can use the `list_examples` / `listExamples` method to evaluate on one or mu ## Related -- More on [how to filter datasets](../datasets/manage_datasets_programmatically#list-examples-by-structured-filter) +- More on [how to filter datasets](./manage_datasets_programmatically#list-examples-by-structured-filter) diff --git a/docs/evaluation/how_to_guides/evaluation/dataset_version.mdx b/docs/evaluation/how_to_guides/dataset_version.mdx similarity index 90% rename from docs/evaluation/how_to_guides/evaluation/dataset_version.mdx rename to docs/evaluation/how_to_guides/dataset_version.mdx index e592bcad..564c1295 100644 --- a/docs/evaluation/how_to_guides/evaluation/dataset_version.mdx +++ b/docs/evaluation/how_to_guides/dataset_version.mdx @@ -8,8 +8,8 @@ import { :::tip Recommended reading -Before diving into this content, it might be helpful to read the [guide on versioning datasets](../datasets/version_datasets). -Additionally, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples). +Before diving into this content, it might be helpful to read the [guide on versioning datasets](./version_datasets). +Additionally, it might be helpful to read the [guide on fetching examples](./manage_datasets_programmatically#fetch-examples). ::: diff --git a/docs/evaluation/how_to_guides/datasets/_category_.json b/docs/evaluation/how_to_guides/datasets/_category_.json deleted file mode 100644 index 43379dda..00000000 --- a/docs/evaluation/how_to_guides/datasets/_category_.json +++ /dev/null @@ -1,5 +0,0 @@ -{ - "label": "Datasets", - "collapsed": true, - "collapsible": true -} diff --git a/docs/evaluation/how_to_guides/evaluation/evaluate_existing_experiment.mdx b/docs/evaluation/how_to_guides/evaluate_existing_experiment.mdx similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/evaluate_existing_experiment.mdx rename to docs/evaluation/how_to_guides/evaluate_existing_experiment.mdx diff --git a/docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx similarity index 90% rename from docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx rename to docs/evaluation/how_to_guides/evaluate_llm_application.mdx index fdefed61..929e04bc 100644 --- a/docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx +++ b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx @@ -12,7 +12,7 @@ import { :::info Key concepts -[Evaluations](../../concepts#applying-evaluations) | [Evaluators](../../concepts#evaluators) | [Datasets](../../concepts#datasets) +[Evaluations](../concepts#applying-evaluations) | [Evaluators](../concepts#evaluators) | [Datasets](../concepts#datasets) ::: @@ -22,7 +22,7 @@ In this guide we'll go over how to evaluate an application using the [evaluate() For larger evaluation jobs in Python we recommend using [aevaluate()](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html), the asynchronous version of `evaluate()`. It is still worthwhile to read this guide first, as the two have nearly identical interfaces, -and then read the how-to guide on [running an evaluation asynchronously](../../how_to_guides/evaluation/async). +and then read the how-to guide on [running an evaluation asynchronously](./async). ::: @@ -92,7 +92,7 @@ To understand how to annotate your code for tracing, please refer to [this guide ## Create or select a dataset -We need a [Dataset](../../concepts#datasets) to evaluate our application on. Our dataset will contain labeled [examples](../../concepts#examples) of toxic and non-toxic text. +We need a [Dataset](../concepts#datasets) to evaluate our application on. Our dataset will contain labeled [examples](../concepts#examples) of toxic and non-toxic text. -See [here](../../how_to_guides#dataset-management) for more on dataset management. +See [here](.#dataset-management) for more on dataset management. ## Define an evaluator -[Evaluators](../../concepts#evaluators) are functions for scoring your application's outputs. They take in the example inputs, actual outputs, and, when present, the reference outputs. +[Evaluators](../concepts#evaluators) are functions for scoring your application's outputs. They take in the example inputs, actual outputs, and, when present, the reference outputs. Since we have labels for this task, our evaluator can directly check if the actual outputs match the reference outputs. -See [here](../../how_to_guides#define-an-evaluator) for more on how to define evaluators. +See [here](.#define-an-evaluator) for more on how to define evaluators. ## Run the evaluation @@ -214,16 +214,16 @@ The key arguments are: ]} /> -See [here](../../how_to_guides#run-an-evaluation) for other ways to kick off evaluations and [here](../../how_to_guides#configure-an-evaluation-job) for how to configure evaluation jobs. +See [here](.#run-an-evaluation) for other ways to kick off evaluations and [here](.#configure-an-evaluation-job) for how to configure evaluation jobs. ## Explore the results -Each invocation of `evaluate()` creates an [Experiment](../../concepts#experiments) which can be viewed in the LangSmith UI or queried via the SDK. +Each invocation of `evaluate()` creates an [Experiment](../concepts#experiments) which can be viewed in the LangSmith UI or queried via the SDK. Evaluation scores are stored against each actual output as feedback. _If you've annotated your code for tracing, you can open the trace of each row in a side panel view._ -![](../evaluation/static/view_experiment.gif) +![](./static/view_experiment.gif) ## Reference code @@ -364,6 +364,6 @@ _If you've annotated your code for tracing, you can open the trace of each row i ## Related -- [Run an evaluation asynchronously](../../how_to_guides/evaluation/async) -- [Run an evaluation via the REST API](../../how_to_guides/evaluation/run_evals_api_only) -- [Run an evaluation from the prompt playground](../../how_to_guides/evaluation/run_evaluation_from_prompt_playground) +- [Run an evaluation asynchronously](./async) +- [Run an evaluation via the REST API](./run_evals_api_only) +- [Run an evaluation from the prompt playground](./run_evaluation_from_prompt_playground) diff --git a/docs/evaluation/how_to_guides/evaluation/evaluate_on_intermediate_steps.mdx b/docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx similarity index 98% rename from docs/evaluation/how_to_guides/evaluation/evaluate_on_intermediate_steps.mdx rename to docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx index 39e1041a..a22f5df1 100644 --- a/docs/evaluation/how_to_guides/evaluation/evaluate_on_intermediate_steps.mdx +++ b/docs/evaluation/how_to_guides/evaluate_on_intermediate_steps.mdx @@ -167,7 +167,7 @@ def rag_pipeline(question): /> This pipeline will produce a trace that looks something like: -![](../evaluation/static/evaluation_intermediate_trace.png) +![](./static/evaluation_intermediate_trace.png) ## 2. Create a dataset and examples to evaluate the pipeline @@ -387,8 +387,8 @@ Finally, we'll run `evaluate` with the custom evaluators defined above. /> The experiment will contain the results of the evaluation, including the scores and comments from the evaluators: -![](../evaluation/static/evaluation_intermediate_experiment.png) +![](./static/evaluation_intermediate_experiment.png) ## Related -- [Evaluate a `langgraph` graph](../evaluation/langgraph) +- [Evaluate a `langgraph` graph](./langgraph) diff --git a/docs/evaluation/how_to_guides/evaluation/evaluate_pairwise.mdx b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx similarity index 96% rename from docs/evaluation/how_to_guides/evaluation/evaluate_pairwise.mdx rename to docs/evaluation/how_to_guides/evaluate_pairwise.mdx index d68b48b7..f21ff146 100644 --- a/docs/evaluation/how_to_guides/evaluation/evaluate_pairwise.mdx +++ b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx @@ -13,7 +13,7 @@ import { :::info Key concepts -- [Pairwise evaluations](../../concepts#pairwise) +- [Pairwise evaluations](../concepts#pairwise) ::: @@ -22,7 +22,7 @@ This allows you to score the outputs from multiple experiments against each othe Think [LMSYS Chatbot Arena](https://chat.lmsys.org/) - this is the same concept! To do this, use the [evaluate_comparative](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate_comparative.html) / `evaluateComparative` function with two existing experiments. -If you haven't already created experiments to compare, check out our [quick start](https://docs.smith.langchain.com/#5-run-your-first-evaluation) or oue [how-to guide](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application) to get started with evaluations. +If you haven't already created experiments to compare, check out our [quick start](../) or our [how-to guide](./evaluate_llm_application) to get started with evaluations. ## `evaluate_comparative` args @@ -240,12 +240,12 @@ In the Python example below, we are pulling [this structured prompt](https://smi Navigate to the "Pairwise Experiments" tab from the dataset page: -![Pairwise Experiments Tab](../evaluation/static/pairwise_from_dataset.png) +![Pairwise Experiments Tab](./static/pairwise_from_dataset.png) Click on a pairwise experiment that you would like to inspect, and you will be brought to the Comparison View: -![Pairwise Comparison View](../evaluation/static/pairwise_comparison_view.png) +![Pairwise Comparison View](./static/pairwise_comparison_view.png) You may filter to runs where the first experiment was better or vice versa by clicking the thumbs up/thumbs down buttons in the table header: -![Pairwise Filtering](../evaluation/static/filter_pairwise.png) +![Pairwise Filtering](./static/filter_pairwise.png) diff --git a/docs/evaluation/how_to_guides/evaluation/_category_.json b/docs/evaluation/how_to_guides/evaluation/_category_.json deleted file mode 100644 index b933b5ac..00000000 --- a/docs/evaluation/how_to_guides/evaluation/_category_.json +++ /dev/null @@ -1,5 +0,0 @@ -{ - "label": "Evaluation", - "collapsed": true, - "collapsible": true -} diff --git a/docs/evaluation/how_to_guides/datasets/export_filtered_traces_to_dataset.mdx b/docs/evaluation/how_to_guides/export_filtered_traces_to_dataset.mdx similarity index 100% rename from docs/evaluation/how_to_guides/datasets/export_filtered_traces_to_dataset.mdx rename to docs/evaluation/how_to_guides/export_filtered_traces_to_dataset.mdx diff --git a/docs/evaluation/how_to_guides/evaluation/fetch_perf_metrics_experiment.mdx b/docs/evaluation/how_to_guides/fetch_perf_metrics_experiment.mdx similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/fetch_perf_metrics_experiment.mdx rename to docs/evaluation/how_to_guides/fetch_perf_metrics_experiment.mdx diff --git a/docs/evaluation/how_to_guides/evaluation/filter_experiments_ui.mdx b/docs/evaluation/how_to_guides/filter_experiments_ui.mdx similarity index 94% rename from docs/evaluation/how_to_guides/evaluation/filter_experiments_ui.mdx rename to docs/evaluation/how_to_guides/filter_experiments_ui.mdx index 6f32dfc2..eff983ef 100644 --- a/docs/evaluation/how_to_guides/evaluation/filter_experiments_ui.mdx +++ b/docs/evaluation/how_to_guides/filter_experiments_ui.mdx @@ -74,20 +74,20 @@ and a known ID of the prompt: In the UI, we see all experiments that have been run by default. -![](../evaluation/static/filter-all-experiments.png) +![](./static/filter-all-experiments.png) If we, say, have a preference for openai models, we can easily filter down and see scores within just openai models first: -![](../evaluation/static/filter-openai.png) +![](./static/filter-openai.png) We can stack filters, allowing us to filter out low scores on correctness to make sure we only compare relevant experiments: -![](../evaluation/static/filter-feedback.png) +![](./static/filter-feedback.png) Finally, we can clear and reset filters. For example, if we see there is clear there's a winner with the `singleminded` prompt, we can change filtering settings to see if any other model providers' models work as well with it: -![](../evaluation/static/filter-singleminded.png) +![](./static/filter-singleminded.png) diff --git a/docs/evaluation/how_to_guides/human_feedback/_category_.json b/docs/evaluation/how_to_guides/human_feedback/_category_.json deleted file mode 100644 index c98af163..00000000 --- a/docs/evaluation/how_to_guides/human_feedback/_category_.json +++ /dev/null @@ -1,5 +0,0 @@ -{ - "label": "Human feedback", - "collapsed": true, - "collapsible": true -} diff --git a/docs/evaluation/how_to_guides/human_feedback/static/annotate_trace_inline.png b/docs/evaluation/how_to_guides/human_feedback/static/annotate_trace_inline.png deleted file mode 100644 index 7b45fbf0..00000000 Binary files a/docs/evaluation/how_to_guides/human_feedback/static/annotate_trace_inline.png and /dev/null differ diff --git a/docs/evaluation/how_to_guides/index.md b/docs/evaluation/how_to_guides/index.md index 7c3226d6..230692b2 100644 --- a/docs/evaluation/how_to_guides/index.md +++ b/docs/evaluation/how_to_guides/index.md @@ -12,82 +12,82 @@ Evaluate and improve your application before deploying it. ### Run an evaluation -- [Run an evaluation](./how_to_guides/evaluation/evaluate_llm_application) -- [Run an evaluation asynchronously](./how_to_guides/evaluation/async) -- [Run an evaluation comparing two experiments](./how_to_guides/evaluation/evaluate_pairwise) -- [Evaluate a `langchain` runnable](./how_to_guides/evaluation/langchain_runnable) -- [Evaluate a `langgraph` graph](./how_to_guides/evaluation/langgraph) -- [Run an evaluation of an existing experiment](./how_to_guides/evaluation/evaluate_existing_experiment) -- [Run an evaluation via the REST API](./how_to_guides/evaluation/run_evals_api_only) -- [Run an evaluation from the UI](./how_to_guides/evaluation/run_evaluation_from_prompt_playground) +- [Run an evaluation](./how_to_guides/evaluate_llm_application) +- [Run an evaluation asynchronously](./how_to_guides/async) +- [Run an evaluation comparing two experiments](./how_to_guides/evaluate_pairwise) +- [Evaluate a `langchain` runnable](./how_to_guides/langchain_runnable) +- [Evaluate a `langgraph` graph](./how_to_guides/langgraph) +- [Run an evaluation of an existing experiment](./how_to_guides/evaluate_existing_experiment) +- [Run an evaluation via the REST API](./how_to_guides/run_evals_api_only) +- [Run an evaluation from the UI](./how_to_guides/run_evaluation_from_prompt_playground) ### Define an evaluator -- [Define a custom evaluator](./how_to_guides/evaluation/custom_evaluator) -- [Define an LLM-as-a-judge evaluator](./how_to_guides/evaluation/llm_as_judge) -- [Define a pairwise evaluator](./how_to_guides/evaluation/evaluate_pairwise) -- [Define a summary evaluator](./how_to_guides/evaluation/summary) -- [Use an off-the-shelf evaluator via the SDK (Python only)](./how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators) -- [Evaluate intermediate steps](./how_to_guides/evaluation/evaluate_on_intermediate_steps) -- [Return multiple metrics in one evaluator](./how_to_guides/evaluation/multiple_scores) -- [Return categorical vs numerical metrics](./how_to_guides/evaluation/metric_type) +- [Define a custom evaluator](./how_to_guides/custom_evaluator) +- [Define an LLM-as-a-judge evaluator](./how_to_guides/llm_as_judge) +- [Define a pairwise evaluator](./how_to_guides/evaluate_pairwise) +- [Define a summary evaluator](./how_to_guides/summary) +- [Use an off-the-shelf evaluator via the SDK (Python only)](./how_to_guides/use_langchain_off_the_shelf_evaluators) +- [Evaluate intermediate steps](./how_to_guides/evaluate_on_intermediate_steps) +- [Return multiple metrics in one evaluator](./how_to_guides/multiple_scores) +- [Return categorical vs numerical metrics](./how_to_guides/metric_type) ### Configure the evaluation data -- [Evaluate on a split / filtered view of a dataset](./how_to_guides/evaluation/dataset_subset) -- [Evaluate on a specific dataset version](./how_to_guides/evaluation/dataset_version) +- [Evaluate on a split / filtered view of a dataset](./how_to_guides/dataset_subset) +- [Evaluate on a specific dataset version](./how_to_guides/dataset_version) ### Configure an evaluation job -- [Evaluate with repetitions](./how_to_guides/evaluation/repetition) -- [Handle model rate limits](./how_to_guides/evaluation/rate_limiting) +- [Evaluate with repetitions](./how_to_guides/repetition) +- [Handle model rate limits](./how_to_guides/rate_limiting) ## Unit testing Unit test your system to identify bugs and regressions. -- [Unit test applications (Python only)](./how_to_guides/evaluation/unit_testing) +- [Unit test applications (Python only)](./how_to_guides/unit_testing) ## Online evaluation Evaluate and monitor your system's live performance on production data. - [Set up an online evaluator](../../observability/how_to_guides/monitoring/online_evaluations) -- [Create a few-shot evaluator](./how_to_guides/evaluation/create_few_shot_evaluators) +- [Create a few-shot evaluator](./how_to_guides/create_few_shot_evaluators) ## Automatic evaluation Set up evaluators that automatically run for all experiments against a dataset. -- [Set up an auto-evaluator](./how_to_guides/evaluation/bind_evaluator_to_dataset) -- [Create a few-shot evaluator](./how_to_guides/evaluation/create_few_shot_evaluators) +- [Set up an auto-evaluator](./how_to_guides/bind_evaluator_to_dataset) +- [Create a few-shot evaluator](./how_to_guides/create_few_shot_evaluators) ## Analyzing experiment results Use the UI & API to understand your experiment results. -- [Compare experiments with the comparison view](./how_to_guides/evaluation/compare_experiment_results) -- [Filter experiments](./how_to_guides/evaluation/filter_experiments_ui) -- [View pairwise experiments](./how_to_guides/evaluation/evaluate_pairwise#view-pairwise-experiments) -- [Fetch experiment results in the SDK](./how_to_guides/evaluation/fetch_perf_metrics_experiment) -- [Upload experiments run outside of LangSmith with the REST API](./how_to_guides/evaluation/upload_existing_experiments) +- [Compare experiments with the comparison view](./how_to_guides/compare_experiment_results) +- [Filter experiments](./how_to_guides/filter_experiments_ui) +- [View pairwise experiments](./how_to_guides/evaluate_pairwise#view-pairwise-experiments) +- [Fetch experiment results in the SDK](./how_to_guides/fetch_perf_metrics_experiment) +- [Upload experiments run outside of LangSmith with the REST API](./how_to_guides/upload_existing_experiments) ## Dataset management Manage datasets in LangSmith used by your evaluations. -- [Manage datasets from the UI](./how_to_guides/datasets/manage_datasets_in_application) -- [Manage datasets programmatically](./how_to_guides/datasets/manage_datasets_programmatically) -- [Version datasets](./how_to_guides/datasets/version_datasets) -- [Share or unshare a dataset publicly](./how_to_guides/datasets/share_dataset) -- [Export filtered traces from an experiment to a dataset](./how_to_guides/datasets/export_filtered_traces_to_dataset) +- [Manage datasets from the UI](./how_to_guides/manage_datasets_in_application) +- [Manage datasets programmatically](./how_to_guides/manage_datasets_programmatically) +- [Version datasets](./how_to_guides/version_datasets) +- [Share or unshare a dataset publicly](./how_to_guides/share_dataset) +- [Export filtered traces from an experiment to a dataset](./how_to_guides/export_filtered_traces_to_dataset) ## Annotation queues and human feedback Collect feedback from subject matter experts and users to improve your applications. -- [Use annotation queues](./how_to_guides/human_feedback/annotation_queues) -- [Capture user feedback from your application to traces](./how_to_guides/human_feedback/attach_user_feedback) -- [Set up a new feedback criteria](./how_to_guides/human_feedback/set_up_feedback_criteria) -- [Annotate traces inline](./how_to_guides/human_feedback/annotate_traces_inline) -- [Audit and correct evaluator scores](./how_to_guides/evaluation/audit_evaluator_scores) +- [Use annotation queues](./how_to_guides/annotation_queues) +- [Capture user feedback from your application to traces](./how_to_guides/attach_user_feedback) +- [Set up a new feedback criteria](./how_to_guides/set_up_feedback_criteria) +- [Annotate traces inline](./how_to_guides/annotate_traces_inline) +- [Audit and correct evaluator scores](./how_to_guides/audit_evaluator_scores) diff --git a/docs/evaluation/how_to_guides/datasets/index_datasets_for_dynamic_few_shot_example_selection.mdx b/docs/evaluation/how_to_guides/index_datasets_for_dynamic_few_shot_example_selection.mdx similarity index 100% rename from docs/evaluation/how_to_guides/datasets/index_datasets_for_dynamic_few_shot_example_selection.mdx rename to docs/evaluation/how_to_guides/index_datasets_for_dynamic_few_shot_example_selection.mdx diff --git a/docs/evaluation/how_to_guides/evaluation/langchain_runnable.mdx b/docs/evaluation/how_to_guides/langchain_runnable.mdx similarity index 97% rename from docs/evaluation/how_to_guides/evaluation/langchain_runnable.mdx rename to docs/evaluation/how_to_guides/langchain_runnable.mdx index 3993abfa..faf2b216 100644 --- a/docs/evaluation/how_to_guides/evaluation/langchain_runnable.mdx +++ b/docs/evaluation/how_to_guides/langchain_runnable.mdx @@ -132,8 +132,8 @@ To evaluate our chain we can pass it directly to the `evaluate()` / `aevaluate() The runnable is traced appropriately for each output. -![](../evaluation/static/runnable_eval.png) +![](./static/runnable_eval.png) ## Related -- [How to evaluate a `langgraph` graph](../evaluation/langgraph) +- [How to evaluate a `langgraph` graph](./langgraph) diff --git a/docs/evaluation/how_to_guides/evaluation/langgraph.mdx b/docs/evaluation/how_to_guides/langgraph.mdx similarity index 99% rename from docs/evaluation/how_to_guides/evaluation/langgraph.mdx rename to docs/evaluation/how_to_guides/langgraph.mdx index ef3373cf..557c2a6c 100644 --- a/docs/evaluation/how_to_guides/evaluation/langgraph.mdx +++ b/docs/evaluation/how_to_guides/langgraph.mdx @@ -239,7 +239,7 @@ If we need access to information about intermediate steps that isn't in state, w :::tip Custom evaluators -See more about what arguments you can pass to custom evaluators in this [how-to guide](../evaluation/custom_evaluator). +See more about what arguments you can pass to custom evaluators in this [how-to guide](./custom_evaluator). ::: diff --git a/docs/evaluation/how_to_guides/evaluation/llm_as_judge.mdx b/docs/evaluation/how_to_guides/llm_as_judge.mdx similarity index 89% rename from docs/evaluation/how_to_guides/evaluation/llm_as_judge.mdx rename to docs/evaluation/how_to_guides/llm_as_judge.mdx index c8a0b8f7..b4d7ba8a 100644 --- a/docs/evaluation/how_to_guides/evaluation/llm_as_judge.mdx +++ b/docs/evaluation/how_to_guides/llm_as_judge.mdx @@ -8,7 +8,7 @@ import { :::info Key concepts -- [LLM-as-a-judge evaluator](../../concepts#llm-as-judge) +- [LLM-as-a-judge evaluator](../concepts#llm-as-judge) ::: @@ -72,8 +72,8 @@ for the answer is logically valid and consistent with question and the answer.\\ ]} /> -See [here](../../how_to_guides/evaluation/custom_evaluator) for more on how to write a custom evaluator. +See [here](./custom_evaluator) for more on how to write a custom evaluator. ## Prebuilt evaluator via `langchain` -See [here](../../how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators) for how to use prebuilt evaluators from `langchain`. +See [here](./use_langchain_off_the_shelf_evaluators) for how to use prebuilt evaluators from `langchain`. diff --git a/docs/evaluation/how_to_guides/datasets/manage_datasets_in_application.mdx b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx similarity index 99% rename from docs/evaluation/how_to_guides/datasets/manage_datasets_in_application.mdx rename to docs/evaluation/how_to_guides/manage_datasets_in_application.mdx index 9beabd03..c99bf18a 100644 --- a/docs/evaluation/how_to_guides/datasets/manage_datasets_in_application.mdx +++ b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx @@ -7,7 +7,7 @@ sidebar_position: 1 :::tip Recommended Reading Before diving into this content, it might be helpful to read the following: -- [Concepts guide on evaluation and datasets](../../concepts#datasets-and-examples) +- [Concepts guide on evaluation and datasets](../concepts#datasets-and-examples) ::: diff --git a/docs/evaluation/how_to_guides/datasets/manage_datasets_programmatically.mdx b/docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx similarity index 100% rename from docs/evaluation/how_to_guides/datasets/manage_datasets_programmatically.mdx rename to docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx diff --git a/docs/evaluation/how_to_guides/evaluation/metric_type.mdx b/docs/evaluation/how_to_guides/metric_type.mdx similarity index 90% rename from docs/evaluation/how_to_guides/evaluation/metric_type.mdx rename to docs/evaluation/how_to_guides/metric_type.mdx index a3aa401a..68610753 100644 --- a/docs/evaluation/how_to_guides/evaluation/metric_type.mdx +++ b/docs/evaluation/how_to_guides/metric_type.mdx @@ -6,7 +6,7 @@ import { # How to return categorical vs numerical metrics -LangSmith supports both categorical and numerical metrics, and you can return either when writing a [custom evaluator](../../how_to_guides/evaluation/custom_evaluator). +LangSmith supports both categorical and numerical metrics, and you can return either when writing a [custom evaluator](./custom_evaluator). For an evaluator result to be logged as a numerical metric, it must returned as: @@ -68,4 +68,4 @@ Here are some examples: ## Related -- [Return multiple metrics in one evaluator](../../how_to_guides/evaluation/multiple_scores) +- [Return multiple metrics in one evaluator](./multiple_scores) diff --git a/docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx b/docs/evaluation/how_to_guides/multiple_scores.mdx similarity index 86% rename from docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx rename to docs/evaluation/how_to_guides/multiple_scores.mdx index 2a433002..17f3fb9d 100644 --- a/docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx +++ b/docs/evaluation/how_to_guides/multiple_scores.mdx @@ -6,7 +6,7 @@ import { # How to return multiple scores in one evaluator -Sometimes it is useful for a [custom evaluator function](../../how_to_guides/evaluation/custom_evaluator) or [summary evaluator function](../../how_to_guides/evaluation/summary) to return multiple metrics. +Sometimes it is useful for a [custom evaluator function](./custom_evaluator) or [summary evaluator function](./summary) to return multiple metrics. For example, if you have multiple metrics being generated by an LLM judge, you can save time and money by making a single LLM call that generates multiple metrics instead of making multiple LLM calls. To return multiple scores using the Python SDK, simply return a list of dictionaries/objects of the following form: @@ -71,8 +71,8 @@ Example: Rows from the resulting experiment will display each of the scores. -![](../evaluation/static/multiple_scores.png) +![](./static/multiple_scores.png) ## Related -- [Return categorical vs numerical metrics](../../how_to_guides/evaluation/metric_type) +- [Return categorical vs numerical metrics](./metric_type) diff --git a/docs/evaluation/how_to_guides/evaluation/rate_limiting.mdx b/docs/evaluation/how_to_guides/rate_limiting.mdx similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/rate_limiting.mdx rename to docs/evaluation/how_to_guides/rate_limiting.mdx diff --git a/docs/evaluation/how_to_guides/evaluation/repetition.mdx b/docs/evaluation/how_to_guides/repetition.mdx similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/repetition.mdx rename to docs/evaluation/how_to_guides/repetition.mdx diff --git a/docs/evaluation/how_to_guides/evaluation/run_evals_api_only.mdx b/docs/evaluation/how_to_guides/run_evals_api_only.mdx similarity index 98% rename from docs/evaluation/how_to_guides/evaluation/run_evals_api_only.mdx rename to docs/evaluation/how_to_guides/run_evals_api_only.mdx index d50578f1..40fc5fbd 100644 --- a/docs/evaluation/how_to_guides/evaluation/run_evals_api_only.mdx +++ b/docs/evaluation/how_to_guides/run_evals_api_only.mdx @@ -26,7 +26,7 @@ This guide will show you how to run evals using the REST API, using the `request ## Create a dataset -Here, we are using the python SDK for convenience. You can also use the API directly use the UI, see [this guide](../datasets/manage_datasets_in_application) for more information. +Here, we are using the python SDK for convenience. You can also use the API directly use the UI, see [this guide](./manage_datasets_in_application) for more information. ```python import openai @@ -191,7 +191,7 @@ for model_name in model_names: ## Run a pairwise experiment Next, we'll demonstrate how to run a pairwise experiment. In a pairwise experiment, you compare two examples against each other. -For more information, check out [this guide](../evaluation/evaluate_pairwise). +For more information, check out [this guide](./evaluate_pairwise). ```python # A comparative experiment allows you to provide a preferential ranking on the outputs of two or more experiments diff --git a/docs/evaluation/how_to_guides/evaluation/run_evaluation_from_prompt_playground.mdx b/docs/evaluation/how_to_guides/run_evaluation_from_prompt_playground.mdx similarity index 91% rename from docs/evaluation/how_to_guides/evaluation/run_evaluation_from_prompt_playground.mdx rename to docs/evaluation/how_to_guides/run_evaluation_from_prompt_playground.mdx index b2dee48b..726b2935 100644 --- a/docs/evaluation/how_to_guides/evaluation/run_evaluation_from_prompt_playground.mdx +++ b/docs/evaluation/how_to_guides/run_evaluation_from_prompt_playground.mdx @@ -12,12 +12,12 @@ This allows you to test your prompt / model configuration over a series of input 1. **Navigate to the prompt playground** by clicking on "Prompts" in the sidebar, then selecting a prompt from the list of available prompts or creating a new one. 2. **Select the "Switch to dataset" button** to switch to the dataset you want to use for the experiment. Please note that the dataset keys of the dataset inputs must match the input variables of the prompt. In the below sections, note that the selected dataset has inputs with keys "text", which correctly match the input variable of the prompt. Also note that there is a max capacity of 15 inputs for the prompt playground. - ![Switch to dataset](../evaluation/static/switch_to_dataset.png) + ![Switch to dataset](./static/switch_to_dataset.png) 3. **Click on the "Start" button** or CMD+Enter to start the experiment. This will run the prompt over all the examples in the dataset and create an entry for the experiment in the dataset details page. Note that you need to commit the prompt to the prompt hub before you can start the experiment to ensure it can be referenced in the experiment. The result for each input will be streamed and displayed inline for each input in the dataset. - ![Input variables](../evaluation/static/input_variables_playground.png) + ![Input variables](./static/input_variables_playground.png) 4. **View the results** by clicking on the "View Experiment" button at the bottom of the page. This will take you to the experiment details page where you can see the results of the experiment. 5. **Navigate back to the commit page** by clicking on the "View Commit" button. This will take you back to the prompt page where you can make changes to the prompt and run more experiments. The "View Commit" button is available to all experiments that were run from the prompt playground. The experiment is prefixed with the prompt repository name, a unique identifier, and the date and time the experiment was run. - ![Playground experiment results](../evaluation/static/playground_experiment_results.png) + ![Playground experiment results](./static/playground_experiment_results.png) ## Add evaluation scores to the experiment diff --git a/docs/evaluation/how_to_guides/human_feedback/set_up_feedback_criteria.mdx b/docs/evaluation/how_to_guides/set_up_feedback_criteria.mdx similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/set_up_feedback_criteria.mdx rename to docs/evaluation/how_to_guides/set_up_feedback_criteria.mdx diff --git a/docs/evaluation/how_to_guides/datasets/share_dataset.mdx b/docs/evaluation/how_to_guides/share_dataset.mdx similarity index 100% rename from docs/evaluation/how_to_guides/datasets/share_dataset.mdx rename to docs/evaluation/how_to_guides/share_dataset.mdx diff --git a/docs/evaluation/how_to_guides/evaluation/static/add-auto-evaluator-python.png b/docs/evaluation/how_to_guides/static/add-auto-evaluator-python.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/add-auto-evaluator-python.png rename to docs/evaluation/how_to_guides/static/add-auto-evaluator-python.png diff --git a/docs/evaluation/how_to_guides/datasets/static/add-filtered-traces-to-dataset.png b/docs/evaluation/how_to_guides/static/add-filtered-traces-to-dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/add-filtered-traces-to-dataset.png rename to docs/evaluation/how_to_guides/static/add-filtered-traces-to-dataset.png diff --git a/docs/evaluation/how_to_guides/datasets/static/add_manual_example.png b/docs/evaluation/how_to_guides/static/add_manual_example.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/add_manual_example.png rename to docs/evaluation/how_to_guides/static/add_manual_example.png diff --git a/docs/evaluation/how_to_guides/datasets/static/add_metadata.png b/docs/evaluation/how_to_guides/static/add_metadata.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/add_metadata.png rename to docs/evaluation/how_to_guides/static/add_metadata.png diff --git a/docs/evaluation/how_to_guides/datasets/static/add_to..dataset.png b/docs/evaluation/how_to_guides/static/add_to..dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/add_to..dataset.png rename to docs/evaluation/how_to_guides/static/add_to..dataset.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/add_to_annotation_queue.png b/docs/evaluation/how_to_guides/static/add_to_annotation_queue.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/add_to_annotation_queue.png rename to docs/evaluation/how_to_guides/static/add_to_annotation_queue.png diff --git a/docs/evaluation/how_to_guides/datasets/static/add_to_dataset.png b/docs/evaluation/how_to_guides/static/add_to_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/add_to_dataset.png rename to docs/evaluation/how_to_guides/static/add_to_dataset.png diff --git a/docs/evaluation/how_to_guides/datasets/static/add_to_dataset_from_aq.png b/docs/evaluation/how_to_guides/static/add_to_dataset_from_aq.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/add_to_dataset_from_aq.png rename to docs/evaluation/how_to_guides/static/add_to_dataset_from_aq.png diff --git a/docs/evaluation/how_to_guides/datasets/static/add_to_split2.png b/docs/evaluation/how_to_guides/static/add_to_split2.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/add_to_split2.png rename to docs/evaluation/how_to_guides/static/add_to_split2.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/annotate_trace_inline.png b/docs/evaluation/how_to_guides/static/annotate_trace_inline.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/annotate_trace_inline.png rename to docs/evaluation/how_to_guides/static/annotate_trace_inline.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/annotation_queue_edit.png b/docs/evaluation/how_to_guides/static/annotation_queue_edit.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/annotation_queue_edit.png rename to docs/evaluation/how_to_guides/static/annotation_queue_edit.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/annotation_queue_form.png b/docs/evaluation/how_to_guides/static/annotation_queue_form.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/annotation_queue_form.png rename to docs/evaluation/how_to_guides/static/annotation_queue_form.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/annotation_sidebar.png b/docs/evaluation/how_to_guides/static/annotation_sidebar.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/annotation_sidebar.png rename to docs/evaluation/how_to_guides/static/annotation_sidebar.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/cat_feedback.png b/docs/evaluation/how_to_guides/static/cat_feedback.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/cat_feedback.png rename to docs/evaluation/how_to_guides/static/cat_feedback.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/click_to_edit_prompt.png b/docs/evaluation/how_to_guides/static/click_to_edit_prompt.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/click_to_edit_prompt.png rename to docs/evaluation/how_to_guides/static/click_to_edit_prompt.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/code-autoeval-popup.png b/docs/evaluation/how_to_guides/static/code-autoeval-popup.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/code-autoeval-popup.png rename to docs/evaluation/how_to_guides/static/code-autoeval-popup.png diff --git a/docs/evaluation/how_to_guides/datasets/static/confirmation.png b/docs/evaluation/how_to_guides/static/confirmation.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/confirmation.png rename to docs/evaluation/how_to_guides/static/confirmation.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/cont_feedback.png b/docs/evaluation/how_to_guides/static/cont_feedback.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/cont_feedback.png rename to docs/evaluation/how_to_guides/static/cont_feedback.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/corrections_comparison_view.png b/docs/evaluation/how_to_guides/static/corrections_comparison_view.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/corrections_comparison_view.png rename to docs/evaluation/how_to_guides/static/corrections_comparison_view.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/corrections_runs_table.png b/docs/evaluation/how_to_guides/static/corrections_runs_table.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/corrections_runs_table.png rename to docs/evaluation/how_to_guides/static/corrections_runs_table.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/create_annotation_queue.png b/docs/evaluation/how_to_guides/static/create_annotation_queue.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/create_annotation_queue.png rename to docs/evaluation/how_to_guides/static/create_annotation_queue.png diff --git a/docs/evaluation/how_to_guides/datasets/static/create_dataset_csv.png b/docs/evaluation/how_to_guides/static/create_dataset_csv.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/create_dataset_csv.png rename to docs/evaluation/how_to_guides/static/create_dataset_csv.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/create_evaluator.png b/docs/evaluation/how_to_guides/static/create_evaluator.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/create_evaluator.png rename to docs/evaluation/how_to_guides/static/create_evaluator.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/create_few_shot_evaluator.png b/docs/evaluation/how_to_guides/static/create_few_shot_evaluator.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/create_few_shot_evaluator.png rename to docs/evaluation/how_to_guides/static/create_few_shot_evaluator.png diff --git a/docs/evaluation/how_to_guides/datasets/static/custom_json_schema.png b/docs/evaluation/how_to_guides/static/custom_json_schema.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/custom_json_schema.png rename to docs/evaluation/how_to_guides/static/custom_json_schema.png diff --git a/docs/evaluation/how_to_guides/datasets/static/dataset_schema_definition.png b/docs/evaluation/how_to_guides/static/dataset_schema_definition.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/dataset_schema_definition.png rename to docs/evaluation/how_to_guides/static/dataset_schema_definition.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/edit_evaluator.png b/docs/evaluation/how_to_guides/static/edit_evaluator.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/edit_evaluator.png rename to docs/evaluation/how_to_guides/static/edit_evaluator.png diff --git a/docs/evaluation/how_to_guides/datasets/static/enter_dataset_details.png b/docs/evaluation/how_to_guides/static/enter_dataset_details.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/enter_dataset_details.png rename to docs/evaluation/how_to_guides/static/enter_dataset_details.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/evaluation_intermediate_experiment.png b/docs/evaluation/how_to_guides/static/evaluation_intermediate_experiment.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/evaluation_intermediate_experiment.png rename to docs/evaluation/how_to_guides/static/evaluation_intermediate_experiment.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/evaluation_intermediate_trace.png b/docs/evaluation/how_to_guides/static/evaluation_intermediate_trace.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/evaluation_intermediate_trace.png rename to docs/evaluation/how_to_guides/static/evaluation_intermediate_trace.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/evaluator_prompt.png b/docs/evaluation/how_to_guides/static/evaluator_prompt.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/evaluator_prompt.png rename to docs/evaluation/how_to_guides/static/evaluator_prompt.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/expanded_view.png b/docs/evaluation/how_to_guides/static/expanded_view.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/expanded_view.png rename to docs/evaluation/how_to_guides/static/expanded_view.png diff --git a/docs/evaluation/how_to_guides/datasets/static/experiment-tracing-project.png b/docs/evaluation/how_to_guides/static/experiment-tracing-project.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/experiment-tracing-project.png rename to docs/evaluation/how_to_guides/static/experiment-tracing-project.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/experiments-tab-code-results.png b/docs/evaluation/how_to_guides/static/experiments-tab-code-results.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/experiments-tab-code-results.png rename to docs/evaluation/how_to_guides/static/experiments-tab-code-results.png diff --git a/docs/evaluation/how_to_guides/datasets/static/export-dataset-button.png b/docs/evaluation/how_to_guides/static/export-dataset-button.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/export-dataset-button.png rename to docs/evaluation/how_to_guides/static/export-dataset-button.png diff --git a/docs/evaluation/how_to_guides/datasets/static/export-dataset-modal.png b/docs/evaluation/how_to_guides/static/export-dataset-modal.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/export-dataset-modal.png rename to docs/evaluation/how_to_guides/static/export-dataset-modal.png diff --git a/docs/evaluation/how_to_guides/datasets/static/export-filtered-trace-to-dataset.png b/docs/evaluation/how_to_guides/static/export-filtered-trace-to-dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/export-filtered-trace-to-dataset.png rename to docs/evaluation/how_to_guides/static/export-filtered-trace-to-dataset.png diff --git a/docs/evaluation/how_to_guides/datasets/static/few_shot_code_snippet.png b/docs/evaluation/how_to_guides/static/few_shot_code_snippet.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/few_shot_code_snippet.png rename to docs/evaluation/how_to_guides/static/few_shot_code_snippet.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/few_shot_example.png b/docs/evaluation/how_to_guides/static/few_shot_example.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/few_shot_example.png rename to docs/evaluation/how_to_guides/static/few_shot_example.png diff --git a/docs/evaluation/how_to_guides/datasets/static/few_shot_search_results.png b/docs/evaluation/how_to_guides/static/few_shot_search_results.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/few_shot_search_results.png rename to docs/evaluation/how_to_guides/static/few_shot_search_results.png diff --git a/docs/evaluation/how_to_guides/datasets/static/few_shot_synced_empty_state.png b/docs/evaluation/how_to_guides/static/few_shot_synced_empty_state.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/few_shot_synced_empty_state.png rename to docs/evaluation/how_to_guides/static/few_shot_synced_empty_state.png diff --git a/docs/evaluation/how_to_guides/datasets/static/few_shot_tab_unsynced.png b/docs/evaluation/how_to_guides/static/few_shot_tab_unsynced.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/few_shot_tab_unsynced.png rename to docs/evaluation/how_to_guides/static/few_shot_tab_unsynced.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter-all-experiments.png b/docs/evaluation/how_to_guides/static/filter-all-experiments.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/filter-all-experiments.png rename to docs/evaluation/how_to_guides/static/filter-all-experiments.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter-feedback.png b/docs/evaluation/how_to_guides/static/filter-feedback.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/filter-feedback.png rename to docs/evaluation/how_to_guides/static/filter-feedback.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter-openai.png b/docs/evaluation/how_to_guides/static/filter-openai.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/filter-openai.png rename to docs/evaluation/how_to_guides/static/filter-openai.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter-singleminded.png b/docs/evaluation/how_to_guides/static/filter-singleminded.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/filter-singleminded.png rename to docs/evaluation/how_to_guides/static/filter-singleminded.png diff --git a/docs/evaluation/how_to_guides/datasets/static/filter_examples.png b/docs/evaluation/how_to_guides/static/filter_examples.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/filter_examples.png rename to docs/evaluation/how_to_guides/static/filter_examples.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter_pairwise.png b/docs/evaluation/how_to_guides/static/filter_pairwise.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/filter_pairwise.png rename to docs/evaluation/how_to_guides/static/filter_pairwise.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/filter_to_regressions.png b/docs/evaluation/how_to_guides/static/filter_to_regressions.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/filter_to_regressions.png rename to docs/evaluation/how_to_guides/static/filter_to_regressions.png diff --git a/docs/evaluation/how_to_guides/datasets/static/filtered-traces-from-experiment.png b/docs/evaluation/how_to_guides/static/filtered-traces-from-experiment.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/filtered-traces-from-experiment.png rename to docs/evaluation/how_to_guides/static/filtered-traces-from-experiment.png diff --git a/docs/evaluation/how_to_guides/datasets/static/filters_applied.png b/docs/evaluation/how_to_guides/static/filters_applied.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/filters_applied.png rename to docs/evaluation/how_to_guides/static/filters_applied.png diff --git a/docs/evaluation/how_to_guides/datasets/static/generate_synthetic_examples_create.png b/docs/evaluation/how_to_guides/static/generate_synthetic_examples_create.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/generate_synthetic_examples_create.png rename to docs/evaluation/how_to_guides/static/generate_synthetic_examples_create.png diff --git a/docs/evaluation/how_to_guides/datasets/static/generate_synthetic_examples_pane.png b/docs/evaluation/how_to_guides/static/generate_synthetic_examples_pane.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/generate_synthetic_examples_pane.png rename to docs/evaluation/how_to_guides/static/generate_synthetic_examples_pane.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/input_variables_playground.png b/docs/evaluation/how_to_guides/static/input_variables_playground.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/input_variables_playground.png rename to docs/evaluation/how_to_guides/static/input_variables_playground.png diff --git a/docs/evaluation/how_to_guides/datasets/static/modify_example.png b/docs/evaluation/how_to_guides/static/modify_example.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/modify_example.png rename to docs/evaluation/how_to_guides/static/modify_example.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/multi_select_annotation_queue.png b/docs/evaluation/how_to_guides/static/multi_select_annotation_queue.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/multi_select_annotation_queue.png rename to docs/evaluation/how_to_guides/static/multi_select_annotation_queue.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/multiple_scores.png b/docs/evaluation/how_to_guides/static/multiple_scores.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/multiple_scores.png rename to docs/evaluation/how_to_guides/static/multiple_scores.png diff --git a/docs/evaluation/how_to_guides/datasets/static/multiselect_add_to_dataset.png b/docs/evaluation/how_to_guides/static/multiselect_add_to_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/multiselect_add_to_dataset.png rename to docs/evaluation/how_to_guides/static/multiselect_add_to_dataset.png diff --git a/docs/evaluation/how_to_guides/datasets/static/new_dataset.png b/docs/evaluation/how_to_guides/static/new_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/new_dataset.png rename to docs/evaluation/how_to_guides/static/new_dataset.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/open_comparison_view.png b/docs/evaluation/how_to_guides/static/open_comparison_view.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/open_comparison_view.png rename to docs/evaluation/how_to_guides/static/open_comparison_view.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/open_trace_comparison.png b/docs/evaluation/how_to_guides/static/open_trace_comparison.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/open_trace_comparison.png rename to docs/evaluation/how_to_guides/static/open_trace_comparison.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/pairwise_comparison_view.png b/docs/evaluation/how_to_guides/static/pairwise_comparison_view.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/pairwise_comparison_view.png rename to docs/evaluation/how_to_guides/static/pairwise_comparison_view.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/pairwise_from_dataset.png b/docs/evaluation/how_to_guides/static/pairwise_from_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/pairwise_from_dataset.png rename to docs/evaluation/how_to_guides/static/pairwise_from_dataset.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/playground_evaluator_results.png b/docs/evaluation/how_to_guides/static/playground_evaluator_results.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/playground_evaluator_results.png rename to docs/evaluation/how_to_guides/static/playground_evaluator_results.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/playground_experiment_results.png b/docs/evaluation/how_to_guides/static/playground_experiment_results.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/playground_experiment_results.png rename to docs/evaluation/how_to_guides/static/playground_experiment_results.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/regression_test.gif b/docs/evaluation/how_to_guides/static/regression_test.gif similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/regression_test.gif rename to docs/evaluation/how_to_guides/static/regression_test.gif diff --git a/docs/evaluation/how_to_guides/evaluation/static/regression_view.png b/docs/evaluation/how_to_guides/static/regression_view.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/regression_view.png rename to docs/evaluation/how_to_guides/static/regression_view.png diff --git a/docs/evaluation/how_to_guides/human_feedback/static/review_runs.png b/docs/evaluation/how_to_guides/static/review_runs.png similarity index 100% rename from docs/evaluation/how_to_guides/human_feedback/static/review_runs.png rename to docs/evaluation/how_to_guides/static/review_runs.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/runnable_eval.png b/docs/evaluation/how_to_guides/static/runnable_eval.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/runnable_eval.png rename to docs/evaluation/how_to_guides/static/runnable_eval.png diff --git a/docs/evaluation/how_to_guides/datasets/static/schema_validation.png b/docs/evaluation/how_to_guides/static/schema_validation.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/schema_validation.png rename to docs/evaluation/how_to_guides/static/schema_validation.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/select_baseline.png b/docs/evaluation/how_to_guides/static/select_baseline.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/select_baseline.png rename to docs/evaluation/how_to_guides/static/select_baseline.png diff --git a/docs/evaluation/how_to_guides/datasets/static/select_columns.png b/docs/evaluation/how_to_guides/static/select_columns.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/select_columns.png rename to docs/evaluation/how_to_guides/static/select_columns.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/select_feedback.png b/docs/evaluation/how_to_guides/static/select_feedback.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/select_feedback.png rename to docs/evaluation/how_to_guides/static/select_feedback.png diff --git a/docs/evaluation/how_to_guides/datasets/static/share_dataset.png b/docs/evaluation/how_to_guides/static/share_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/share_dataset.png rename to docs/evaluation/how_to_guides/static/share_dataset.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/show-feedback-from-autoeval-code.png b/docs/evaluation/how_to_guides/static/show-feedback-from-autoeval-code.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/show-feedback-from-autoeval-code.png rename to docs/evaluation/how_to_guides/static/show-feedback-from-autoeval-code.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/summary_eval.png b/docs/evaluation/how_to_guides/static/summary_eval.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/summary_eval.png rename to docs/evaluation/how_to_guides/static/summary_eval.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/switch_to_dataset.png b/docs/evaluation/how_to_guides/static/switch_to_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/switch_to_dataset.png rename to docs/evaluation/how_to_guides/static/switch_to_dataset.png diff --git a/docs/evaluation/how_to_guides/datasets/static/tag_this_version.png b/docs/evaluation/how_to_guides/static/tag_this_version.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/tag_this_version.png rename to docs/evaluation/how_to_guides/static/tag_this_version.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/toggle_views.png b/docs/evaluation/how_to_guides/static/toggle_views.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/toggle_views.png rename to docs/evaluation/how_to_guides/static/toggle_views.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/unit-test-suite.png b/docs/evaluation/how_to_guides/static/unit-test-suite.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/unit-test-suite.png rename to docs/evaluation/how_to_guides/static/unit-test-suite.png diff --git a/docs/evaluation/how_to_guides/datasets/static/unshare_dataset.png b/docs/evaluation/how_to_guides/static/unshare_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/unshare_dataset.png rename to docs/evaluation/how_to_guides/static/unshare_dataset.png diff --git a/docs/evaluation/how_to_guides/datasets/static/unshare_trace_list.png b/docs/evaluation/how_to_guides/static/unshare_trace_list.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/unshare_trace_list.png rename to docs/evaluation/how_to_guides/static/unshare_trace_list.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/update_display.png b/docs/evaluation/how_to_guides/static/update_display.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/update_display.png rename to docs/evaluation/how_to_guides/static/update_display.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/uploaded_dataset.png b/docs/evaluation/how_to_guides/static/uploaded_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/uploaded_dataset.png rename to docs/evaluation/how_to_guides/static/uploaded_dataset.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/uploaded_dataset_examples.png b/docs/evaluation/how_to_guides/static/uploaded_dataset_examples.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/uploaded_dataset_examples.png rename to docs/evaluation/how_to_guides/static/uploaded_dataset_examples.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/uploaded_experiment.png b/docs/evaluation/how_to_guides/static/uploaded_experiment.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/uploaded_experiment.png rename to docs/evaluation/how_to_guides/static/uploaded_experiment.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/use_corrections_as_few_shot.png b/docs/evaluation/how_to_guides/static/use_corrections_as_few_shot.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/use_corrections_as_few_shot.png rename to docs/evaluation/how_to_guides/static/use_corrections_as_few_shot.png diff --git a/docs/evaluation/how_to_guides/datasets/static/version_dataset.png b/docs/evaluation/how_to_guides/static/version_dataset.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/version_dataset.png rename to docs/evaluation/how_to_guides/static/version_dataset.png diff --git a/docs/evaluation/how_to_guides/datasets/static/version_dataset_tests.png b/docs/evaluation/how_to_guides/static/version_dataset_tests.png similarity index 100% rename from docs/evaluation/how_to_guides/datasets/static/version_dataset_tests.png rename to docs/evaluation/how_to_guides/static/version_dataset_tests.png diff --git a/docs/evaluation/how_to_guides/evaluation/static/view_experiment.gif b/docs/evaluation/how_to_guides/static/view_experiment.gif similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/view_experiment.gif rename to docs/evaluation/how_to_guides/static/view_experiment.gif diff --git a/docs/evaluation/how_to_guides/evaluation/static/view_few_shot_ds.png b/docs/evaluation/how_to_guides/static/view_few_shot_ds.png similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/static/view_few_shot_ds.png rename to docs/evaluation/how_to_guides/static/view_few_shot_ds.png diff --git a/docs/evaluation/how_to_guides/evaluation/summary.mdx b/docs/evaluation/how_to_guides/summary.mdx similarity index 98% rename from docs/evaluation/how_to_guides/evaluation/summary.mdx rename to docs/evaluation/how_to_guides/summary.mdx index 97fd68bf..761043eb 100644 --- a/docs/evaluation/how_to_guides/evaluation/summary.mdx +++ b/docs/evaluation/how_to_guides/summary.mdx @@ -73,4 +73,4 @@ You can then pass this evaluator to the `evaluate` method as follows: In the LangSmith UI, you'll the summary evaluator's score displayed with the corresponding key. -![](../evaluation/static/summary_eval.png) +![](./static/summary_eval.png) diff --git a/docs/evaluation/how_to_guides/evaluation/unit_testing.mdx b/docs/evaluation/how_to_guides/unit_testing.mdx similarity index 99% rename from docs/evaluation/how_to_guides/evaluation/unit_testing.mdx rename to docs/evaluation/how_to_guides/unit_testing.mdx index b43eab1b..a6ce4b06 100644 --- a/docs/evaluation/how_to_guides/evaluation/unit_testing.mdx +++ b/docs/evaluation/how_to_guides/unit_testing.mdx @@ -57,7 +57,7 @@ Each time you run this test suite, LangSmith collects the pass/fail rate and oth The test suite syncs to a corresponding dataset named after your package or github repository. -![Test Example](../evaluation/static/unit-test-suite.png) +![Test Example](./static/unit-test-suite.png) ## Going further diff --git a/docs/evaluation/how_to_guides/evaluation/upload_existing_experiments.mdx b/docs/evaluation/how_to_guides/upload_existing_experiments.mdx similarity index 97% rename from docs/evaluation/how_to_guides/evaluation/upload_existing_experiments.mdx rename to docs/evaluation/how_to_guides/upload_existing_experiments.mdx index c9c8551d..caa2901a 100644 --- a/docs/evaluation/how_to_guides/evaluation/upload_existing_experiments.mdx +++ b/docs/evaluation/how_to_guides/upload_existing_experiments.mdx @@ -260,12 +260,12 @@ information in the request body). ## View the experiment in the UI Now, login to the UI and click on your newly-created dataset! You should see a single experiment: -![Uploaded experiments table](../evaluation/static/uploaded_dataset.png) +![Uploaded experiments table](./static/uploaded_dataset.png) Your examples will have been uploaded: -![Uploaded examples](../evaluation/static/uploaded_dataset_examples.png) +![Uploaded examples](./static/uploaded_dataset_examples.png) Clicking on your experiment will bring you to the comparison view: -![Uploaded experiment comparison view](../evaluation/static/uploaded_experiment.png) +![Uploaded experiment comparison view](./static/uploaded_experiment.png) As you upload more experiments to your dataset, you will be able to compare the results and easily identify regressions in the comparison view. diff --git a/docs/evaluation/how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators.mdx b/docs/evaluation/how_to_guides/use_langchain_off_the_shelf_evaluators.mdx similarity index 100% rename from docs/evaluation/how_to_guides/evaluation/use_langchain_off_the_shelf_evaluators.mdx rename to docs/evaluation/how_to_guides/use_langchain_off_the_shelf_evaluators.mdx diff --git a/docs/evaluation/how_to_guides/datasets/version_datasets.mdx b/docs/evaluation/how_to_guides/version_datasets.mdx similarity index 97% rename from docs/evaluation/how_to_guides/datasets/version_datasets.mdx rename to docs/evaluation/how_to_guides/version_datasets.mdx index 0f15f123..df7e2418 100644 --- a/docs/evaluation/how_to_guides/datasets/version_datasets.mdx +++ b/docs/evaluation/how_to_guides/version_datasets.mdx @@ -46,4 +46,4 @@ client.update_dataset_tag( ) ``` -To run an evaluation on a particular tagged version of a dataset, you can follow [this guide](../evaluation/dataset_version). +To run an evaluation on a particular tagged version of a dataset, you can follow [this guide](./dataset_version). diff --git a/docs/evaluation/index.mdx b/docs/evaluation/index.mdx index a88c782f..f8d368a3 100644 --- a/docs/evaluation/index.mdx +++ b/docs/evaluation/index.mdx @@ -116,7 +116,7 @@ groupId="client-language" Click the link printed out by your evaluation run to access the LangSmith Experiments UI, and explore the results of your evaluation. -![](./how_to_guides/evaluation/static/view_experiment.gif) +![](./how_to_guides/static/view_experiment.gif) ## Next steps diff --git a/docs/evaluation/tutorials/agents.mdx b/docs/evaluation/tutorials/agents.mdx index 9efd0f73..c66aacdb 100644 --- a/docs/evaluation/tutorials/agents.mdx +++ b/docs/evaluation/tutorials/agents.mdx @@ -6,7 +6,7 @@ import { RegionalUrl } from "@site/src/components/RegionalUrls"; # Evaluate an agent -In this tutorial, we will walk through 3 evaluation strategies LLM agents, building on the conceptual points shared in our [evaluation guide](https://docs.smith.langchain.com/evaluation/concepts#agents). +In this tutorial, we will walk through 3 evaluation strategies LLM agents, building on the conceptual points shared in our [evaluation guide](../concepts#agents). - `Final Response`: Evaluate the agent's final response. - `Single step`: Evaluate any agent step in isolation (e.g., whether it selects the appropriate tool). @@ -348,7 +348,7 @@ Agent evaluation can focus on at least 3 things: :::tip -See our [evaluation guide](https://docs.smith.langchain.com/evaluation/concepts#agents) for more details on Agent evaluation. +See our [evaluation guide](../concepts#agents) for more details on Agent evaluation. ::: @@ -358,7 +358,7 @@ We can evaluate how well an agent does overall on a task. This basically involve :::tip -See the full overview of agent response evaluation in our [conceptual guide](https://docs.smith.langchain.com/evaluation/concepts#evaluating-an-agents-final-response). +See the full overview of agent response evaluation in our [conceptual guide](../concepts#evaluating-an-agents-final-response). ::: @@ -401,7 +401,7 @@ def predict_sql_agent_answer(example: dict): `Evaluator` -This can [follow what we do for RAG](https://docs.smith.langchain.com/tutorials/Developers/rag) where we compare the generated answer with the reference answer. +This can [follow what we do for RAG](./rag) where we compare the generated answer with the reference answer. ```python from langchain import hub @@ -456,11 +456,11 @@ Agents generally make multiple actions. While it is useful to evaluate them end- :::tip -See the full overview of single step evaluation in our [conceptual guide](https://docs.smith.langchain.com/evaluation/concepts#evaluating-a-single-step-of-an-agent). +See the full overview of single step evaluation in our [conceptual guide](../concepts#evaluating-a-single-step-of-an-agent). ::: -We can check a specific tool call using [a custom evaluator](https://docs.smith.langchain.com/how_to_guides/evaluation/custom_evaluator): +We can check a specific tool call using [a custom evaluator](../how_to_guides/custom_evaluator): - Here, we just invoke the assistant, `assistant_runnable`, with a prompt and check if the resulting tool call is as expected. - Here, we are using a specialized agent where the tools are hard-coded (rather than passed with the dataset input). @@ -507,7 +507,7 @@ experiment_results = evaluate( ### Trajectory -We can check a trajectory of tool calls using [custom evaluators](https://docs.smith.langchain.com/how_to_guides/evaluation/custom_evaluator): +We can check a trajectory of tool calls using [custom evaluators](../how_to_guides/custom_evaluator): - Here, we just invoke the agent, `graph.invoke`, with a prompt. - Here, we are using a specialized agent where the tools are hard-coded (rather than passed with the dataset input). @@ -519,7 +519,7 @@ We can check a trajectory of tool calls using [custom evaluators](https://docs.s :::tip -See the full overview of single step evaluation in our [conceptual guide](https://docs.smith.langchain.com/evaluation/concepts#evaluating-an-agents-trajectory). +See the full overview of single step evaluation in our [conceptual guide](../concepts#evaluating-an-agents-trajectory). ::: diff --git a/docs/evaluation/tutorials/rag.mdx b/docs/evaluation/tutorials/rag.mdx index 3ff6eddf..7dec0ed5 100644 --- a/docs/evaluation/tutorials/rag.mdx +++ b/docs/evaluation/tutorials/rag.mdx @@ -406,7 +406,7 @@ However, we will show that this is not required. We can isolate them as intermediate chain steps. -See detail on isolating intermediate chain steps [here](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_on_intermediate_steps). +See detail on isolating intermediate chain steps [here](../how_to_guides/evaluate_on_intermediate_steps). Here is the a video from our LangSmith evaluation series for reference: diff --git a/docs/evaluation/tutorials/swe-benchmark.mdx b/docs/evaluation/tutorials/swe-benchmark.mdx index aa7ee4b0..c1f9b00b 100644 --- a/docs/evaluation/tutorials/swe-benchmark.mdx +++ b/docs/evaluation/tutorials/swe-benchmark.mdx @@ -72,7 +72,7 @@ dataset = client.upload_csv( ### Create dataset split for quicker testing -Since running the SWE-bench evaluator takes a long time when run on all examples, you can create a "test" split for quickly testing the evaluator and your code. Read [this guide](../../evaluation/how_to_guides/datasets/manage_datasets_in_application#create-and-manage-dataset-splits) to learn more about managing dataset splits, or watch this short video that shows how to do it (to get to the starting page of the video, just click on your dataset created above and go to the `Examples` tab): +Since running the SWE-bench evaluator takes a long time when run on all examples, you can create a "test" split for quickly testing the evaluator and your code. Read [this guide](../../evaluation/how_to_guides/manage_datasets_in_application#create-and-manage-dataset-splits) to learn more about managing dataset splits, or watch this short video that shows how to do it (to get to the starting page of the video, just click on your dataset created above and go to the `Examples` tab): import creating_split from "./static/creating_split.mp4"; diff --git a/docs/observability/concepts/index.mdx b/docs/observability/concepts/index.mdx index b4acdff7..f007e0fc 100644 --- a/docs/observability/concepts/index.mdx +++ b/docs/observability/concepts/index.mdx @@ -50,9 +50,9 @@ Feedback can currently be continuous or discrete (categorical), and you can reus Collecting feedback on runs can be done in a number of ways: -1. [Sent up along with a trace](/evaluation/how_to_guides/human_feedback/attach_user_feedback) from the LLM application -2. Generated by a user in the app [inline](/evaluation/how_to_guides/human_feedback/annotate_traces_inline) or in an [annotation queue](../evaluation/how_to_guides/human_feedback/annotation_queues) -3. Generated by an automatic evaluator during [offline evaluation](/evaluation/how_to_guides/evaluation/evaluate_llm_application) +1. [Sent up along with a trace](/evaluation/how_to_guides/attach_user_feedback) from the LLM application +2. Generated by a user in the app [inline](/evaluation/how_to_guides/annotate_traces_inline) or in an [annotation queue](../evaluation/how_to_guides/annotation_queues) +3. Generated by an automatic evaluator during [offline evaluation](/evaluation/how_to_guides/evaluate_llm_application) 4. Generated by an [online evaluator](./how_to_guides/monitoring/online_evaluations) To learn more about how feedback is stored in the application, see [this reference guide](../reference/data_formats/feedback_data_format). diff --git a/docs/observability/how_to_guides/monitoring/rules.mdx b/docs/observability/how_to_guides/monitoring/rules.mdx index bedaa787..898cdb37 100644 --- a/docs/observability/how_to_guides/monitoring/rules.mdx +++ b/docs/observability/how_to_guides/monitoring/rules.mdx @@ -31,7 +31,7 @@ _Alternatively_, you can access rules in settings by navigating to