Skip to content

Actions: openai/evals

Actions

Run new evals

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
112 workflow runs
112 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Track the Stat Eval
Run new evals #2236: Pull request #1489 synchronize by thesofakillers
March 19, 2024 09:33 2m 31s thesofakillers:tts
March 19, 2024 09:33 2m 31s
Already Said That Eval
Run new evals #2235: Pull request #1490 synchronize by thesofakillers
March 19, 2024 09:32 3m 18s thesofakillers:ast
March 19, 2024 09:32 3m 18s
Add Human-Relative MLAgentBench
Run new evals #2234: Pull request #1496 synchronize by danesherbs
March 19, 2024 09:20 3m 36s danesherbs:dane/add-mlab-v2
March 19, 2024 09:20 3m 36s
Add Human-Relative MLAgentBench
Run new evals #2233: Pull request #1496 synchronize by danesherbs
March 19, 2024 09:12 3m 47s danesherbs:dane/add-mlab-v2
March 19, 2024 09:12 3m 47s
Add skill acquisition eval
Run new evals #2232: Pull request #1497 opened by inwaves
March 19, 2024 08:25 2m 21s inwaves:andrei/updates-20240319
March 19, 2024 08:25 2m 21s
Error Recovery Eval
Run new evals #2231: Pull request #1485 synchronize by ojaffe
March 19, 2024 08:15 6m 35s ojaffe:ollie/error_recovery
March 19, 2024 08:15 6m 35s
Add Human-Relative MLAgentBench
Run new evals #2230: Pull request #1496 synchronize by danesherbs
March 19, 2024 07:31 3m 43s danesherbs:dane/add-mlab-v2
March 19, 2024 07:31 3m 43s
Add Human-Relative MLAgentBench
Run new evals #2229: Pull request #1496 synchronize by danesherbs
March 19, 2024 07:13 3m 55s danesherbs:dane/add-mlab-v2
March 19, 2024 07:13 3m 55s
Add Human-Relative MLAgentBench
Run new evals #2228: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:32 6m 24s danesherbs:dane/add-mlab-v2
March 19, 2024 06:32 6m 24s
Add Human-Relative MLAgentBench
Run new evals #2227: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:25 3m 29s danesherbs:dane/add-mlab-v2
March 19, 2024 06:25 3m 29s
Add Human-Relative MLAgentBench
Run new evals #2226: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:02 3m 26s danesherbs:dane/add-mlab-v2
March 19, 2024 06:02 3m 26s
Add Human-Relative MLAgentBench
Run new evals #2225: Pull request #1496 opened by danesherbs
March 19, 2024 05:57 2m 6s danesherbs:dane/add-mlab-v2
March 19, 2024 05:57 2m 6s
Add Function Deduction eval
Run new evals #2223: Pull request #1492 opened by james-aung
March 15, 2024 18:25 2m 17s james-aung:function-deduction
March 15, 2024 18:25 2m 17s
Add In-Context RL eval
Run new evals #2222: Pull request #1491 opened by james-aung
March 15, 2024 18:24 2m 5s james-aung:incontext-rl
March 15, 2024 18:24 2m 5s
Already Said That Eval
Run new evals #2221: Pull request #1490 synchronize by thesofakillers
March 15, 2024 14:22 2m 29s thesofakillers:ast
March 15, 2024 14:22 2m 29s
Track the Stat Eval
Run new evals #2220: Pull request #1489 opened by thesofakillers
March 15, 2024 14:06 3m 36s thesofakillers:tts
March 15, 2024 14:06 3m 36s
Identifying Variables Eval
Run new evals #2219: Pull request #1488 synchronize by thesofakillers
March 15, 2024 13:46 3m 33s thesofakillers:idvars
March 15, 2024 13:46 3m 33s
Identifying Variables Eval
Run new evals #2218: Pull request #1488 synchronize by thesofakillers
March 15, 2024 13:45 4m 5s thesofakillers:idvars
March 15, 2024 13:45 4m 5s
Identifying Variables Eval
Run new evals #2217: Pull request #1488 opened by thesofakillers
March 15, 2024 13:38 2m 33s thesofakillers:idvars
March 15, 2024 13:38 2m 33s
Can't Do That Anymore Eval
Run new evals #2216: Pull request #1487 opened by ojaffe
March 15, 2024 10:54 2m 7s ojaffe:ollie/cant_do_that_anymore
March 15, 2024 10:54 2m 7s
Bugged Tools Eval
Run new evals #2215: Pull request #1486 opened by ojaffe
March 15, 2024 10:37 2m 5s ojaffe:ollie/bugged_tools
March 15, 2024 10:37 2m 5s
Error Recovery Eval
Run new evals #2214: Pull request #1485 synchronize by ojaffe
March 15, 2024 10:32 2m 10s ojaffe:ollie/error_recovery
March 15, 2024 10:32 2m 10s
Error Recovery Eval
Run new evals #2213: Pull request #1485 opened by ojaffe
March 15, 2024 10:25 2m 48s ojaffe:ollie/error_recovery
March 15, 2024 10:25 2m 48s
Updates on existing evals; readmes; solvers
Run new evals #2212: Pull request #1483 opened by ojaffe
March 13, 2024 09:45 2m 16s ojaffe:ollie/updates-20240313
March 13, 2024 09:45 2m 16s
Drop two datasets from steganography
Run new evals #2211: Pull request #1481 opened by thesofakillers
March 12, 2024 07:54 2m 0s thesofakillers:steg-data
March 12, 2024 07:54 2m 0s