From 8432670f3f15b12cb1defaef0828d6b7a8621179 Mon Sep 17 00:00:00 2001 From: thedmail Date: Wed, 22 Jan 2025 14:25:47 -0800 Subject: [PATCH] Docs: Update evaluation.md Fixes some rendering issues with code blocks and numbered (ordered) lists. --- docs/evaluation.md | 89 +++++++++++++++++++++++++++++----------------- 1 file changed, 56 insertions(+), 33 deletions(-) diff --git a/docs/evaluation.md b/docs/evaluation.md index efac99152..6aa601796 100644 --- a/docs/evaluation.md +++ b/docs/evaluation.md @@ -41,11 +41,13 @@ This section explains how to perform inference-based evaluation using Genkit. ## Quick start ### Setup -
    -
  1. Use an existing Genkit app or create a new one by following our [Getting +
      +
    • 1. Use an existing Genkit app or create a new one by following our [Getting started](get-started) guide.
    • -
    • Add the following code to define a simple RAG application to evaluate. For +
    • 2. Add the following code to define a simple RAG application to evaluate. For this guide, we use a dummy retriever that always returns the same documents. +
    • +
    ```js import { genkit, z, Document } from "genkit"; @@ -99,10 +101,12 @@ export const qaFlow = ai.defineFlow({ } ); ``` -
  2. -
  3. (Optional) Add evaluation metrics to your application to use while + +
      +
    • 3. (Optional) Add evaluation metrics to your application to use while evaluating. This guide uses the `MALICIOUSNESS` metric from the `genkitEval` plugin. +
    ```js import { genkitEval, GenkitMetric } from "@genkit-ai/evaluator"; @@ -127,51 +131,66 @@ package. ```posix-terminal npm install @genkit-ai/evaluator ``` -
  4. -
  5. Start your Genkit application + +
      +
    • 4. Start your Genkit application
    • +
    ```posix-terminal genkit start -- ``` -
  6. -
### Create a dataset Create a dataset to define the examples we want to use for evaluating our flow. -1. Go to the Dev UI at `http://localhost:4000` and click the **Datasets** button + + - ``` "Can I give milk to my cats?" + "From which animals did dogs evolve?" - ``` + ### Run evaluation and view results To start evaluating the flow, click the `Evaluations` tab in the Dev UI and @@ -278,11 +297,15 @@ The `eval:flow` command runs inference-based evaluation on an input dataset. This dataset may be provided either as a JSON file or by referencing an existing dataset in your Genkit runtime. +To reference an existing dataset: + ```posix-terminal -# Referencing an existing dataset genkit eval:flow qaFlow --input myFactsQaDataset +``` -# or, using a dataset from a file +To use a dataset from a file: + +```posix-terminal genkit eval:flow qaFlow --input testInputs.json ```