-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/UPOS combinations that should be ruled out across English treebanks #549
Comments
Fixed the GUM PronType and Degree issues upstream, thanks |
In the validator I've disabled PronType for NUM/SCONJ, and limited Case=Nom to PRON. I think all the corpora are up to date with these changes (GUM changes are upstream). |
The validator for English does not allow the negative twopart conjunction neither - nor to be annotated as Negative, neither by PronType nor by Polarity as these features are disallowed for upos CCONJ. What is the reasoning behind that decision? |
It's a good question—I don't know if it has been discussed. https://universaldependencies.org/en/feat/Polarity.html doesn't mention CCONJ one way or the other, and I don't see it on the universal documentation pages either. A CCONJ is a function word, but is it "pronoun-like"? |
Opened an issue for wider discussion: UniversalDependencies/docs#1056 |
|
AFAICT the only remaining issues now are |
Validator configuration now updated for all these features. There is a pending decision about negative CCONJ (UniversalDependencies/docs#1056). Beyond that, LinES still has some validation issues. Can't tell for sure about GUM* because changes are upstream. |
OK, I pushed a preview version to UD dev (just GUM, no reddit/GENTLE) so you can take a look, I think everything we discussed is implemented. |
Still some errors for GENTLE: http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/validation-report.pl?UD_English-GENTLE |
^ |
We should tighten up the allowed combinations in the validator settings. Several that stand out as problematic but will require data fixes:
Case=Nom
in LinES: Case=Nom is applied widely beyond pronouns UD_English-LinES#17Degree
for non-ADJ
/ADV
e.g. this query (first identified in Feature documentation tools/data/feats.json docs#1055) (GUM*, LiNES, PUD: fixed)Gender
for non-PRON
words in ParTUT: Gender feature applied too widely UD_English-ParTUT#2Number
e.g. this query: Spurious Number feature on ADJs UD_English-ParTUT#3PronType
forNUM
,SCONJ
e.g. this query (GUM*, LinES, PUD: fixed)(Manually fixed some
Definite
,Tense
,VerbForm
issues in LinES and PUD)The text was updated successfully, but these errors were encountered: