Is IRNet_pretrained.model supposed to achieve 50%+ dev accuracy? #18

oney · 2020-01-24T08:59:07Z

I evaluate using eval.sh with IRNet_pretrained.model, and run spider official script. But I got strange result.

                     easy                 medium               hard                 extra                all
count                249                  438                  171                  170                  1028

====================== EXACT MATCHING ACCURACY =====================
exact match          0.084                0.078                0.088                0.065                0.079

---------------------PARTIAL MATCHING ACCURACY----------------------
select               0.207                0.204                0.340                0.253                0.234
select(no AGG)       0.224                0.209                0.346                0.259                0.243
where                0.175                0.168                0.111                0.176                0.162
where(no OP)         0.200                0.168                0.125                0.284                0.189
group(no Having)     0.091                0.318                0.286                0.333                0.285
group                0.000                0.217                0.122                0.286                0.182
order                0.000                0.105                0.275                0.298                0.164
and/or               1.000                0.912                0.898                0.890                0.927
IUEN                 0.000                0.000                0.105                0.148                0.071
keywords             0.369                0.313                0.246                0.224                0.298
---------------------- PARTIAL MATCHING RECALL ----------------------
select               0.201                0.192                0.304                0.235                0.220
select(no AGG)       0.217                0.196                0.310                0.241                0.228
where                0.194                0.158                0.090                0.133                0.148
where(no OP)         0.222                0.158                0.101                0.214                0.174
group(no Having)     0.150                0.315                0.359                0.177                0.269
group                0.000                0.215                0.154                0.152                0.172
order                0.000                0.133                0.186                0.173                0.148
and/or               0.936                0.960                0.961                0.942                0.951
IUEN                 0.000                0.000                0.051                0.111                0.080
keywords             0.433                0.303                0.205                0.188                0.283
---------------------- PARTIAL MATCHING F1 --------------------------
select               0.204                0.198                0.321                0.244                0.227
select(no AGG)       0.220                0.202                0.327                0.250                0.235
where                0.184                0.163                0.099                0.151                0.155
where(no OP)         0.211                0.163                0.112                0.244                0.181
group(no Having)     0.113                0.317                0.318                0.231                0.276
group                1.000                0.216                0.136                0.198                0.177
order                1.000                0.118                0.222                0.219                0.155
and/or               0.967                0.936                0.928                0.915                0.939
IUEN                 1.000                1.000                0.069                0.127                0.075
keywords             0.399                0.308                0.224                0.204                0.290

Did I do something wrong?
Thanks!

BTW, the length of prediction of IRNet is 1028, and the length of official dev_gold.sql is 1034.

The text was updated successfully, but these errors were encountered:

SivilTaram · 2020-02-15T03:03:53Z

It is worth noting that if the length of prediction is not consistent (1028 != 1034), the evaluation does not make sense as there are mismatchs between the groundtruth and prediction.

hanrelan · 2020-03-12T00:02:46Z

Hi, I'm having the same issue. The eval.sh script by default generates an output file of 1028 samples. Any advice on how to have it output 1034 samples so the spider evaluator can be used to replicate the leaderboard result?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is IRNet_pretrained.model supposed to achieve 50%+ dev accuracy? #18

Is IRNet_pretrained.model supposed to achieve 50%+ dev accuracy? #18

oney commented Jan 24, 2020 •

edited

Loading

SivilTaram commented Feb 15, 2020

hanrelan commented Mar 12, 2020

Is IRNet_pretrained.model supposed to achieve 50%+ dev accuracy? #18

Is IRNet_pretrained.model supposed to achieve 50%+ dev accuracy? #18

Comments

oney commented Jan 24, 2020 • edited Loading

SivilTaram commented Feb 15, 2020

hanrelan commented Mar 12, 2020

oney commented Jan 24, 2020 •

edited

Loading