Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator support for ExtPos in checking external deprels #1062

Open
nschneid opened this issue Nov 2, 2024 · 23 comments
Open

Validator support for ExtPos in checking external deprels #1062

nschneid opened this issue Nov 2, 2024 · 23 comments
Assignees
Labels
a:enhancement b:features b:universal b:UPOS Universal part-of-speech tags: definitions and examples
Milestone

Comments

@nschneid
Copy link
Contributor

nschneid commented Nov 2, 2024

In English-EWT sentence answers-20111108103930AA7FPhc_ans-0007 there is a connective that is clearly supposed to be prepositional "due to" but the "to" is omitted.

The way "due to" is normally handled is as an ADJ+ADP fixed expression, functioning holistically like a preposition, which we indicate with ExtPos=ADP on the first word (#1037).

At present, the validator ignores external deprel checks on fixed heads. But in this sentence, the "to" is missing, so there is no overt fixed relation, and the validator is throwing an error that an ADJ cannot attach as case.

I think the correct validator behavior is to use the ExtPos if present for checking the deprel. I will temporarily change "due" from case to amod but hope to change it back in the future.

nschneid added a commit to UniversalDependencies/UD_English-EWT that referenced this issue Nov 2, 2024
@amir-zeldes
Copy link
Contributor

+1 - the correct deprel is case and not amod. An alternative is to directly tag "due" as a preposition in this context, but I like this suggestion better, since it's really just due an error ;)

@dan-zeman dan-zeman added a:enhancement b:UPOS Universal part-of-speech tags: definitions and examples b:features b:universal labels Nov 4, 2024
@dan-zeman dan-zeman self-assigned this Nov 4, 2024
@dan-zeman dan-zeman added this to the v2.16 milestone Nov 4, 2024
dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 18, 2024
@dan-zeman
Copy link
Member

Implemented. In consequence, some treebanks have errors that were not reported before (because the treebanks use ExtPos and its value does not match the deprel):

  • French-GSD ... 1
  • French-Sequoia ... 2
  • Portuguese-Bosque ... 12
  • Portuguese-GSD ... 6

@arademaker
Copy link
Contributor

Ok, I can fix the Portuguese GSD and Bosque.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 18, 2024
@dan-zeman
Copy link
Member

I am now going to gradually remove the exception for fixed expressions from the rel-upos-* tests, because these can be resolved with ExtPos in a more targeted manner. There will be thus more errors in more treebanks. All such treebanks will be put in the LEGACY status, giving their maintainers four years to fix the data.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 18, 2024
nschneid added a commit to UniversalDependencies/UD_English-EWT that referenced this issue Nov 19, 2024
@nschneid
Copy link
Contributor Author

Thanks. Would it be worth adding a warning for ANY fixed head without ExtPos? Currently it doesn't flag that "according/VERB to/ADP" (fixed expression attaching as case) should have ExtPos but I think it's better with ExtPos=ADP.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 19, 2024
@dan-zeman
Copy link
Member

Thanks. Would it be worth adding a warning for ANY fixed head without ExtPos? Currently it doesn't flag that "according/VERB to/ADP" (fixed expression attaching as case) should have ExtPos but I think it's better with ExtPos=ADP.

I don't know. But according to will probably be flagged in the next round. I am modifying the tests one-by-one, and rel-upos-case had not been modified when you were asking but it has been modified now.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 19, 2024
dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 19, 2024
@nschneid
Copy link
Contributor Author

nschneid commented Nov 19, 2024

"According" is tagged VERB so it can attach as case or mark per the deverbal connectives policy.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 19, 2024
@dan-zeman
Copy link
Member

dan-zeman commented Nov 19, 2024

"According" is tagged VERB so it can attach as case or mark per the deverbal connectives policy.

Shouldn't it now use ExtPos=ADP?

@nschneid
Copy link
Contributor Author

nschneid commented Nov 19, 2024

There are two issues here: the general policy on VERBs as case/mark and the treatment of fixed expressions.

It looks like the validator change UniversalDependencies/tools@5d0d028 prohibits regarding, given, and such as VERBs attaching as case/mark. But the guidelines explicitly say it is OK and we never discussed repealing that in favor of ExtPos.

Assuming the single-word verbal connectives are allowed, my question was whether there should be a WARNING for any fixed expressions lacking ExtPos. I think that was the conclusion of the Core Group discussion.

@amir-zeldes
Copy link
Contributor

My recollection matches Nathan's - using ExtPos for single word 'case' would be a new policy.

@dan-zeman
Copy link
Member

Well, using ExtPos for single word case was the request with which Nathan started this thread – although that was a bit different because there the second word was omitted by mistake.

The change regarding VERBs can be reverted in the validator if desired. But the note that I had there from the time we discussed it in the core group was saying:

###!!! February 2022: Temporarily allow mark+VERB ("regarding"). In the future, it should be banned again
###!!! by default (and case+VERB too), but there should be a language-specific list of exceptions.

So now I thought that instead of implementing a language-specific list of exceptions, one could simply put ExtPos in the data.

@nschneid
Copy link
Contributor Author

I found a note from Dec. 9, 2021: "Remove the categorical prohibition [on VERB/mark]; Dan will add a lexical list of exceptions (but it may take time)"

Perhaps ExtPos is a more economical solution than adding a lexical list. We should discuss in our next meeting. A concern is that we may be moving too fast in making ExtPos mandatory in some circumstances where it wasn't previously.

@MagaliDuran
Copy link
Contributor

Hello everyone

I'm having problems validating cases annotated with ExtPos=PRON, in fixed expressions composed of two PRON (in some cases, two nominative forms). These are fixed expressions common to Portuguese UD corpora ("o qual", "os quais", "o que", etc).

In the ExtPos table (https://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_feature.pl?lcode=pt&feature=ExtPos) there is no possibility of PRON and PRON, because there is no line for PRON.

I've seen that this possibility has been included for French. How can I do the same for Portuguese?

@nschneid
Copy link
Contributor Author

I believe this will be solved by adding PRON as an option to the documentation page: https://universaldependencies.org/pt/feat/ExtPos.html

@MagaliDuran
Copy link
Contributor

Thank you Nathan!
However, I'm not authorized to change this page. I think somebody has to insert another line to the table I mentioned, for PRON, as a possibility. The table is generic and you have to check the combinations that apply for your language.

@dan-zeman
Copy link
Member

dan-zeman commented Feb 26, 2025

In fact, PRON is a legitimate value of ExtPos at the level of universal guidelines. It is allowed by default in all languages and it would be allowed also in Portuguese if the Portuguese-specific page for ExtPos did not exist (or if it listed the value, as Nathan has noted).

Even if you are not authorized to edit the page directly, you can propose changes as a pull request; someone with the rights can then merge it. Once the documentation is (correctly) updated, the clickable form will offer the new value, too.

@MagaliDuran
Copy link
Contributor

Thank you, Dan. I will do this.

@dan-zeman
Copy link
Member

dan-zeman commented Feb 26, 2025

Sorry, in the above link I gave the URL of the universal ExtPos page. It should be the Portuguese one.
https://universaldependencies.org/pt/feat/ExtPos.html
https://github.com/universaldependencies/docs/edit/pages-source/_pt/feat/ExtPos.md

@MagaliDuran
Copy link
Contributor

It is done (I hope I did it right). Please check it.

@dan-zeman
Copy link
Member

It is done (I hope I did it right). Please check it.

You mean a pull request? I am afraid you did not do it right because I do not see any open pull request here.

Alternatively, just send me the suggested text + example that should be there and I can edit it myself.

@MagaliDuran
Copy link
Contributor

I did it here: e3029b0

And now I tried again: pages-source...MagaliDuran:docs:patch-2

The text to be inserted is:

PRON: compound pronoun

Examples

  • O que você quer?

  • Esse é o número para o qual eu telefonei.

@dan-zeman
Copy link
Member

This time the PR worked and I just merged it. You can now go to the ExtPos registration page for Portuguese and you will see the new checkboxes for PRON.

@MagaliDuran
Copy link
Contributor

Thanks a lot! It worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:enhancement b:features b:universal b:UPOS Universal part-of-speech tags: definitions and examples
Projects
None yet
Development

No branches or pull requests

5 participants