Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Update evaluation.md #1649

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 56 additions & 33 deletions docs/evaluation.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow https://www.conventionalcommits.org/en/v1.0.0/ for pr title and commit message

Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,13 @@ This section explains how to perform inference-based evaluation using Genkit.
## Quick start

### Setup
<ol>
<li>Use an existing Genkit app or create a new one by following our [Getting
<ul style="list-style-type:none;">
<li>1. Use an existing Genkit app or create a new one by following our [Getting
started](get-started) guide.</li>
<li>Add the following code to define a simple RAG application to evaluate. For
<li>2. Add the following code to define a simple RAG application to evaluate. For
this guide, we use a dummy retriever that always returns the same documents.
</li>
</ul>

```js
import { genkit, z, Document } from "genkit";
Expand Down Expand Up @@ -99,10 +101,12 @@ export const qaFlow = ai.defineFlow({
}
);
```
</li>
<li>(Optional) Add evaluation metrics to your application to use while

<ul style="list-style-type:none;">
<li>3. (Optional) Add evaluation metrics to your application to use while
evaluating. This guide uses the `MALICIOUSNESS` metric from the
`genkitEval` plugin.
</ul>

```js
import { genkitEval, GenkitMetric } from "@genkit-ai/evaluator";
Expand All @@ -127,51 +131,66 @@ package.
```posix-terminal
npm install @genkit-ai/evaluator
```
</li>
<li>Start your Genkit application

<ul style="list-style-type:none;">
<li>4. Start your Genkit application</li>
</ul>

```posix-terminal
genkit start -- <command to start your app>
```
</li>
</ol>

### Create a dataset

Create a dataset to define the examples we want to use for evaluating our flow.

1. Go to the Dev UI at `http://localhost:4000` and click the **Datasets** button
<ul style="list-style-type:none;">
<li>
1. Go to the Dev UI at <code>http://localhost:4000</code> and click the
<strong>Datasets</strong> button
to open the Datasets page.

2. Click on the **Create Dataset** button to open the create dataset dialog.

a. Provide a `datasetId` for your new dataset. This guide uses
`myFactsQaDataset`.

b. Select `Flow` dataset type.

c. Leave the validation target field empty and click **Save**

</li>
<li>
2. Click on the <strong>Create Dataset</strong> button to open the create dataset dialog.
<ul style="list-style-type:none;">
<li>a. Provide a <code>datasetId</code> for your new dataset. This guide uses
<code>myFactsQaDataset</code>.</li>

<li>b. Select <code>Flow</code> dataset type.</li>

<li>c. Leave the validation target field empty and click
<strong>Save</strong></li>
</ul>
<li>
3. Your new dataset page appears, showing an empty dataset. Add examples to it
by following these steps:

a. Click the **Add example** button to open the example editor panel.

b. Only the `input` field is required. Enter `"Who is man's best friend?"`
in the `input` field, and click **Save** to add the example has to your
dataset.

<ul style="list-style-type:none;">
<li>
a. Click the <strong>Add example</strong> button to open the example editor
panel.
</li>
<li>
b. Only the <code>input</code> field is required. Enter <code>"Who is man's
best friend?"</code>
in the <code>input</code> field, and click <strong>Save</strong> to add
the example to your dataset.
</li>
<li>
c. Repeat steps (a) and (b) a couple more times to add more examples. This
guide adds the following example inputs to the dataset:
</li>
</ul>
</ul>


```
"Can I give milk to my cats?"

"From which animals did dogs evolve?"
```

<ul>
By the end of this step, your dataset should have 3 examples in it, with the
values mentioned above.

values shown above.
</ul>
### Run evaluation and view results

To start evaluating the flow, click the `Evaluations` tab in the Dev UI and
Expand Down Expand Up @@ -278,11 +297,15 @@ The `eval:flow` command runs inference-based evaluation on an input dataset.
This dataset may be provided either as a JSON file or by referencing an existing
dataset in your Genkit runtime.

Copy link
Contributor

@ssbushi ssbushi Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note:

Could you please fix the note in the Schema validation section (Right - Line 260):

These examples may fail when the running an evaluation.

should be
These examples may fail when running an evaluation.

To reference an existing dataset:

```posix-terminal
# Referencing an existing dataset
genkit eval:flow qaFlow --input myFactsQaDataset
```

# or, using a dataset from a file
To use a dataset from a file:

```posix-terminal
genkit eval:flow qaFlow --input testInputs.json
```

Expand Down
Loading