Skip to content

Commit

Permalink
Deployed 06fa7fb with MkDocs version: 1.6.1
Browse files Browse the repository at this point in the history
  • Loading branch information
nitya committed Dec 17, 2024
1 parent 8e86242 commit ae82a5c
Show file tree
Hide file tree
Showing 4 changed files with 499 additions and 387 deletions.
22 changes: 11 additions & 11 deletions 0-Workshop/4-Evaluate/01/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1514,18 +1514,8 @@ <h2 id="4-review-evaluation-dataset">4. Review Evaluation Dataset<a class="heade
<li><strong>Query/Response</strong> - each result has the query, response, and <em>ground truth</em>.</li>
<li><strong>Conversation (single/multi-turn)</strong> - messages (with content, role, optional context)</li>
</ol>
<p>Our dataset reflects the first format, where the <em>test prompts</em> contain a query with the ground truth for evaluating responses. The chat AI will then generate a response (based on query) that gets added to this record, to create the <em>evaluation dataset</em> that is sent to the "judge" AI.</p>
<p><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"query"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Which tent is the most waterproof?"</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="nt">"truth"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Alpine Explorer Tent has the highest rainfly waterproof rating at 3000m"</span>
<span class="p">}</span>
</code></pre></div></td></tr></table></div>
<strong>Let's look at the evaluation script that orchestrates this workflow, next</strong></p>
<details class="info">
<summary>Click to expand and view the evalation dataset</summary>
<summary>Click to expand and view the evaluation dataset</summary>
<div class="highlight"><table class="highlighttable"><tr><th class="filename" colspan="2"><span class="filename">src/api/assets/chat_eval_data.jsonl</span></th></tr><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
Expand Down Expand Up @@ -1554,6 +1544,16 @@ <h2 id="4-review-evaluation-dataset">4. Review Evaluation Dataset<a class="heade
</code></pre></div></td></tr></table></div>
<hr/>
</details>
<p>Our dataset reflects the first format, where the <em>test prompts</em> contain a query with the ground truth for evaluating responses. The chat AI will then generate a response (based on query) that gets added to this record, to create the <em>evaluation dataset</em> that is sent to the "judge" AI.</p>
<p><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"query"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Which tent is the most waterproof?"</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="nt">"truth"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Alpine Explorer Tent has the highest rainfly waterproof rating at 3000m"</span>
<span class="p">}</span>
</code></pre></div></td></tr></table></div>
<strong>Let's look at the evaluation script that orchestrates this workflow, next</strong></p>



Expand Down
Loading

0 comments on commit ae82a5c

Please sign in to comment.