Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Martin/sandbox #23

Open
wants to merge 47 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
44f16ee
add annotation tool
niansong1996 Oct 10, 2022
bde0474
10 annotations for squall for stephen
niansong1996 Oct 12, 2022
87c8385
new ui enhancements and testing
Nov 2, 2022
b4ebe45
fixed bug where cancel still saved result, fixed bug where exec_info …
StephenYin01 Nov 3, 2022
5451568
fixed annotation missing an annotation
StephenYin01 Nov 3, 2022
653af0b
new annotations
StephenYin01 Nov 10, 2022
63c2ed6
new annotations again
StephenYin01 Nov 10, 2022
93074da
50 squall instances
MartinRiddell Nov 11, 2022
f54d0c9
finished 50 examples
StephenYin01 Nov 16, 2022
56a7664
fix bug in initing spider executor
niansong1996 Nov 23, 2022
a0d22b6
fix bug in executor
niansong1996 Nov 25, 2022
b09c70a
Merge pull request #16 from Yale-LILY/martin/squall_annotation
niansong1996 Nov 25, 2022
5e6b374
Merge pull request #15 from Yale-LILY/stephen/annotation-dev
niansong1996 Nov 25, 2022
141e214
new annotations for spider (need to revise some with new execution); …
StephenYin01 Dec 1, 2022
7657ae2
fixed annotation examples
StephenYin01 Dec 2, 2022
3bfdc9f
Merge pull request #17 from Yale-LILY/stephen/spider-annotation-dev
niansong1996 Dec 2, 2022
834dfd7
new examples; fixed spider execution criterion bug
StephenYin01 Dec 2, 2022
0457619
debug spider_execution_py if table name is python reserved keyword, m…
StephenYin01 Dec 3, 2022
400b98f
Merge pull request #18 from Yale-LILY/stephen/spider-annotation-dev
niansong1996 Dec 7, 2022
1c35e33
first few annotations for spider
MartinRiddell Dec 8, 2022
05f3437
Finished 50 annotations for spider dataset
MartinRiddell Feb 8, 2023
4aee7bd
100 annotations by chatgpt on successful and unsuccessful spider data…
MartinRiddell Feb 15, 2023
2780b01
100 annotations by chatgpt on unsuccessful spider dataset
MartinRiddell Feb 15, 2023
faeef72
added dataset of codex's failures on spider
MartinRiddell Feb 15, 2023
c912865
creating a new branch for this notebook
MartinRiddell Feb 23, 2023
6702aff
playing with result processing
MartinRiddell Mar 2, 2023
c3fe0d3
cleaned up a bit
MartinRiddell Mar 28, 2023
a98e6f8
implemented rudimentary human evaluation for model's results on squa…
MartinRiddell Apr 13, 2023
5ca8911
added doc string and a few comments in the code
MartinRiddell Apr 13, 2023
e10f6ab
start_eval can now handle datasets with the same keys as the gsmath d…
MartinRiddell Apr 14, 2023
cbecc01
commented out print statement, and added a check for ERROR messages i…
MartinRiddell Apr 14, 2023
cad6fff
better error handling
MartinRiddell Apr 15, 2023
df59973
added 'big difference' and 'unclear instructions' to the reasons for …
MartinRiddell Apr 15, 2023
29c7552
better error handling. Forgot to add this in previous commit
MartinRiddell Apr 15, 2023
7576f80
began human evaluation of codex on gsmath dataset
MartinRiddell Apr 15, 2023
5270c76
added an extra error to human eval files, and finished evaluating 100…
MartinRiddell Apr 24, 2023
ac8a222
small changes to evaluation script, and eval_report can be used to su…
MartinRiddell May 11, 2023
2a76777
finished evaluating 100 mbpp and gsmath, 50 wtq problems answered by …
MartinRiddell May 11, 2023
e82a2fc
added rekey script. It's not great, but it gives the indices of the q…
MartinRiddell May 12, 2023
17b4556
stephen codex cushman on gsm8k evals
StephenYin01 May 14, 2023
d924d73
finished gpt4 evaluation on GSM8k
StephenYin01 May 16, 2023
577d275
finished human eval of starcoder on gsm8k
StephenYin01 May 16, 2023
8970746
added a few evaluations that are done. Spider_gpt4 isn't done, but pu…
MartinRiddell May 21, 2023
77cc573
resolving merge
MartinRiddell May 21, 2023
6870fcb
finish spider starcoder evals (2 shot)
StephenYin01 May 23, 2023
17a28c1
finished codex cushman eval on spider
StephenYin01 May 24, 2023
afcc894
added evals for gpt4 and davinci models on spider dataset
MartinRiddell May 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
new annotations again
  • Loading branch information
StephenYin01 committed Nov 10, 2022
commit 63c2ed6bedf96edb21eedd85967ff5bf9a5840e7
Loading