You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{
"ERROR": {
"type": "IndexError",
"value": "list index out of range",
"traceback": [
"Traceback (most recent call last):",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 223, in error_catcher",
" g(*pargs, **kwargs)",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 213, in f",
" for response in generator(args, *pargs, **kwargs):",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 1569, in count",
" if group_by[i][0] in split:",
"IndexError: list index out of range"
]
},
"time": 26.713754177093506
}
Does the corpus data perhaps contain something unexpected by /count? Anyway, I think it would be better if the code were able to handle that without such an internal-looking error.
I got the error with a number of different parameters, though I haven’t tried all combinations:
group_by: pos, deprel, msd, word
group_by_struct: thread_title, text_username; but notforum_title
cqp: [], [pos="VB"], [pos="DT"], [msd=".*+.*"], but not[pos="RO"]; with or without anchoring to <text> or <thread>, but not when anchoring to <forum>
It would seem that larger corpora are more likely to cause the error, but that’s not completely consistent, at least if you only take token count into account. And I couldn’t get the error from other than Flashback and Familjeliv subcorpora.
(I came across this issue by accident when testing different combinations of statistics attributes in the frontend.)
The text was updated successfully, but these errors were encountered:
The issue seems to be structural attribute values containing tabs. The statistics query is using CWB's tabulate command, and when grouping by more than one attribute the values are separated by tabs. If the values also contain tabs, the result can't be parsed. I'm not sure if this can be solved while still using tabulate, so maybe a note about tabs in the readme will have to do for now, and some better error handling in the code of course.
Ok, thanks for the explanation. Apparently, we have avoided the issue by disallowing tabs in the values of structural attributes as well as positional ones. I didn’t notice any option in cqp to change the value separator of tabulate, so I suppose you can’t do more than what you suggested.
The
/count
endpoint returns anIndexError: list index out of range
when trying to search certain Flashback or Familjeliv subcorpora with (certain)group_by
andgroup_by_struct
parameters. For example:https://ws.spraakbanken.gu.se/ws/korp/v8/count?group_by=deprel&group_by_struct=thread_title&cqp=%3Cthread%3E+%5Bpos%20%3D%20%22DT%22%5D&corpus=FLASHBACK-DATOR&default_within=sentence&debug=true
results in the following:
Does the corpus data perhaps contain something unexpected by
/count
? Anyway, I think it would be better if the code were able to handle that without such an internal-looking error.I got the error with a number of different parameters, though I haven’t tried all combinations:
group_by
:pos
,deprel
,msd
,word
group_by_struct
:thread_title
,text_username
; but notforum_title
cqp
:[]
,[pos="VB"]
,[pos="DT"]
,[msd=".*+.*"]
, but not[pos="RO"]
; with or without anchoring to<text>
or<thread>
, but not when anchoring to<forum>
corpus
:FLASHBACK-DATOR
,FLASHBACK-HEM
,FLASHBACK-POLITIK
,FLASHBACK-SAMHALLE
,FAMILJELIV-FORALDER
,FAMILJELIV-KANSLIGA
; but notFLASHBACK-LIVSSTIL
,FLASHBACK-EKONOMI
,FLASHBACK-FORDON
,FLASHBACK-DROGER
,FLASHBACK-KULTUR
,FAMILJELIV-ALLMANNA-KROPP
,FAMILJELIV-GRAVID
,TWITTER
,TWITTER-2015
(withgroup_by_struct=user_username
),WIKIPEDIA-SV
(withgroup_by_struct=text_title
)It would seem that larger corpora are more likely to cause the error, but that’s not completely consistent, at least if you only take token count into account. And I couldn’t get the error from other than Flashback and Familjeliv subcorpora.
(I came across this issue by accident when testing different combinations of statistics attributes in the frontend.)
The text was updated successfully, but these errors were encountered: