Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is IRNet_pretrained.model supposed to achieve 50%+ dev accuracy? #18

Open
oney opened this issue Jan 24, 2020 · 2 comments
Open

Is IRNet_pretrained.model supposed to achieve 50%+ dev accuracy? #18

oney opened this issue Jan 24, 2020 · 2 comments

Comments

@oney
Copy link

oney commented Jan 24, 2020

I evaluate using eval.sh with IRNet_pretrained.model, and run spider official script. But I got strange result.

                     easy                 medium               hard                 extra                all
count                249                  438                  171                  170                  1028

====================== EXACT MATCHING ACCURACY =====================
exact match          0.084                0.078                0.088                0.065                0.079

---------------------PARTIAL MATCHING ACCURACY----------------------
select               0.207                0.204                0.340                0.253                0.234
select(no AGG)       0.224                0.209                0.346                0.259                0.243
where                0.175                0.168                0.111                0.176                0.162
where(no OP)         0.200                0.168                0.125                0.284                0.189
group(no Having)     0.091                0.318                0.286                0.333                0.285
group                0.000                0.217                0.122                0.286                0.182
order                0.000                0.105                0.275                0.298                0.164
and/or               1.000                0.912                0.898                0.890                0.927
IUEN                 0.000                0.000                0.105                0.148                0.071
keywords             0.369                0.313                0.246                0.224                0.298
---------------------- PARTIAL MATCHING RECALL ----------------------
select               0.201                0.192                0.304                0.235                0.220
select(no AGG)       0.217                0.196                0.310                0.241                0.228
where                0.194                0.158                0.090                0.133                0.148
where(no OP)         0.222                0.158                0.101                0.214                0.174
group(no Having)     0.150                0.315                0.359                0.177                0.269
group                0.000                0.215                0.154                0.152                0.172
order                0.000                0.133                0.186                0.173                0.148
and/or               0.936                0.960                0.961                0.942                0.951
IUEN                 0.000                0.000                0.051                0.111                0.080
keywords             0.433                0.303                0.205                0.188                0.283
---------------------- PARTIAL MATCHING F1 --------------------------
select               0.204                0.198                0.321                0.244                0.227
select(no AGG)       0.220                0.202                0.327                0.250                0.235
where                0.184                0.163                0.099                0.151                0.155
where(no OP)         0.211                0.163                0.112                0.244                0.181
group(no Having)     0.113                0.317                0.318                0.231                0.276
group                1.000                0.216                0.136                0.198                0.177
order                1.000                0.118                0.222                0.219                0.155
and/or               0.967                0.936                0.928                0.915                0.939
IUEN                 1.000                1.000                0.069                0.127                0.075
keywords             0.399                0.308                0.224                0.204                0.290

Did I do something wrong?
Thanks!

BTW, the length of prediction of IRNet is 1028, and the length of official dev_gold.sql is 1034.

@SivilTaram
Copy link

It is worth noting that if the length of prediction is not consistent (1028 != 1034), the evaluation does not make sense as there are mismatchs between the groundtruth and prediction.

@hanrelan
Copy link

Hi, I'm having the same issue. The eval.sh script by default generates an output file of 1028 samples. Any advice on how to have it output 1034 samples so the spider evaluator can be used to replicate the leaderboard result?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants