You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently running the download script for XTREME. I'm running into some issues with the downloading and preprocessing of the UD data, and wanted to check if some of these are an issue with my setup or an issue with the provided code.
The script uses the third party ud-conversion-tools file $REPO/third_party/ud-conversion-tools/conllu_to_conll.py. However, the script contains the line from lib.conll import CoNLLReader
whereas the lib folder from ud-conversion-tools has not been included in the $REPO/third_party/ud-conversion-tools folder. I was able to get around this by separately git cloning from https://github.com/coastalcph/ud-conversion-tools and adding that to my PYTHONPATH
After correcting for the above, it looks like a good number of the preprocessing commands for UD are able to work, but a small number still run into some errors (or warnings). Are these to be expected? (These are just messages I grabbed during my run)
Case 1.
python /mypath/xtreme/third_party/ud-conversion-tools/conllu_to_conll.py /mypath/xtreme/download//udpos-tmp/ud-treebanks-v2.5/UD_Dutch-Alpino/nl_alpino-ud
-train.conllu /mypath/xtreme/download//udpos-tmp/conll//nl//nl_alpino-ud-trai
n.conll --lang nl --replace_subtokens_with_fused_forms --print_fused_forms
Traceback (most recent call last):
File "/mypath/xtreme/third_party/ud-conversion-tools/conllu_to_conll.py", l
ine 53, in <module>
main()
File "/mypath/xtreme/third_party/ud-conversion-tools/conllu_to_conll.py", l
ine 41, in main
orig_treebank = cio.read_conll_u(args.input)#, args.keep_fused_forms, args.lang, POSRANKPRECEDENC
EDICT)
File "/mypath/xtreme/ud-conversion-tools/lib/conll.py", line 350, in read_conll_
u
token_dict = {key: conv_fn(val) for (key, conv_fn), val in zip(self.CONLL_U_COLUMNS, parts)}
File "/mypath/xtreme/ud-conversion-tools/lib/conll.py", line 350, in <dictcomp>
token_dict = {key: conv_fn(val) for (key, conv_fn), val in zip(self.CONLL_U_COLUMNS, parts)}
File "/mypath/xtreme/ud-conversion-tools/lib/conll.py", line 26, in parse_deps
return [(int(pair[0]), pair[1]) for pair in dep_pairs]
File "/mypath/xtreme/ud-conversion-tools/lib/conll.py", line 26, in <listcomp>
return [(int(pair[0]), pair[1]) for pair in dep_pairs]
ValueError: invalid literal for int() with base 10: '5.1'
Case 2.
Not a tree after fused-form heuristics: غزة 15 - 8 ( اف ب ) - حذرت الجبهة الشعبية لتحرير فلسطين وحزب
الخلاص الوطني ، الاسلامي القريب من حركة حماس ، من اية محاولات او اف منه الى وكالة فرانس برس الى " ضرو
رة الحفاظ على المصداقية في هذا الخصوص والا فان الدولة ستتحول الى ورقة استهلاكية تستخدم في المناسبات "
.
Case 3.
Traceback (most recent call last):
File "/mypath/xtreme/third_party/ud-conversion-tools/conllu_to_conll.py", l
ine 53, in <module>
main()
File "/mypath/xtreme/third_party/ud-conversion-tools/conllu_to_conll.py", l
ine 48, in main
s.filter_sentence_content(args.replace_subtokens_with_fused_forms, args.lang, current_pos_precede
nce_list,args.remove_node_properties,args.remove_deprel_suffixes,args.remove_arabic_diacritics)
File "/mypath/xtreme/ud-conversion-tools/lib/conll.py", line 219, in filter_sent
ence_content
self._keep_fused_form(posPreferenceDict)
File "/mypath/xtreme/ud-conversion-tools/lib/conll.py", line 179, in _keep_fused
_form
deprel = self[localhead][ext_dep]["deprel"]
KeyError: 3
Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I'm currently running the download script for XTREME. I'm running into some issues with the downloading and preprocessing of the UD data, and wanted to check if some of these are an issue with my setup or an issue with the provided code.
ud-conversion-tools
file$REPO/third_party/ud-conversion-tools/conllu_to_conll.py
. However, the script contains the linefrom lib.conll import CoNLLReader
whereas the
lib
folder fromud-conversion-tools
has not been included in the$REPO/third_party/ud-conversion-tools
folder. I was able to get around this by separately git cloning from https://github.com/coastalcph/ud-conversion-tools and adding that to my PYTHONPATHCase 1.
Case 2.
Case 3.
Thanks!
The text was updated successfully, but these errors were encountered: