Logprob based multiple-choice question evals (callback) #18

nielsrolf · 2025-03-05T16:54:49Z

No description provided.

…le Choice Evaluation #### Log Probability Calculation (`logprobs.py`): - Refactored the results appending logic to include the entire dataset entry along with the `messages` key. This ensures that the output dictionary contains all necessary information from the dataset, not just the `messages`. #### Multiple Choice Question Evaluation (`mc_question.py`): - Added support for custom templates and context in the `Question` class. This allows for more flexibility when preparing questions for evaluation. - Improved the log probability summation to account for all messages and blocks, ensuring that the calculation is comprehensive and accurate - Fixed the `logp_correct` value to correctly reflect the log probability of the correct answer. - Introduced default values for `choice_template`, `question_template`, and `answer_template` in the `MultipleChoiceEvalFreeform` class to streamline the creation of freeform multiple-choice evaluations. These changes enhance the functionality and reliability of the log probability calculations and multiple-choice question evaluations.

nielsrolf added 7 commits March 3, 2025 16:24

MC-question wip

0834f40

MCQCallback example works but results are unexpected (investigating)

179cff8

fix bug in logprob visualization; debugging mcq_callback_example

dfc30d6

fix bug in how metrics are being displayed on the dashboard

1b2cd26

refacvtor and fix logprobs.py, fix mc_eval.get_metric

cef1a3f

update mcq_callback_example.py

19f3007

nielsrolf merged commit 410931f into main Mar 5, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logprob based multiple-choice question evals (callback) #18

Logprob based multiple-choice question evals (callback) #18

nielsrolf commented Mar 5, 2025

Logprob based multiple-choice question evals (callback) #18

Logprob based multiple-choice question evals (callback) #18

Conversation

nielsrolf commented Mar 5, 2025