Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADP revision project #15

Open
AngledLuffa opened this issue Dec 7, 2024 · 14 comments
Open

ADP revision project #15

AngledLuffa opened this issue Dec 7, 2024 · 14 comments

Comments

@AngledLuffa
Copy link

A small mini-project: revise features for ADP to better fit UD standards, then update the current dataset and the annotators

Current available AdpType in UD: Circ, Prep, Post, Voc

This doesn't mean others can't exist, but currently the feature scheme used doesn't quite mesh with other UD datasets.

    "ADP":   ['Case=Acc', 'Case=Gen', 'Case=Nom', 'Gender=Fem', 'Gender=Masc', 'Number=Plur', 'Number=Sing', 'Person=3', 'Type=Gen', 'Type=Loc'],

One question would be why there is both a Case=Gen and a Type=Gen. Another question would be what Type=Loc represents and how to update it, if needed.

Recent comment from @muteeurahman

If we try to map this table with Sindhi ADP types then we can map Post (postposition which is very common in Sindhi) and Prep (preposition which we can find rarely few in millions). So, to comply with the UD standard we can add the feature AdpType=Post in all the ADP occurrences (as I don't think that we have any AdpType=Pre encountered yet). Other optional features that are to be placed in cases where ADP is used as a genetive case marker and can have case, num, gen features will be there as needed. So, all ADPs will now have AdpType=Post, and specific ADPs (The genitive case marker ADPs) will have additional features.

@rueter
Copy link

rueter commented Dec 9, 2024

Hi @AngledLuffa, @jonorthwash, @dan-zeman, @garanes, @jasiewert, @Stormur
If we are looking at AdpType, there should definitely be reference to the orientation, i.e., Circ, Post and Prep. Do we have any examples of languages with “circumpositions”?

Initially, “Voc” strikes me as odd, because it does not address the orientation of the ADP (by the way, do we have orientation types for SCONJ with “mark” dependency)? “Voc”, if accepted, would indicate function. Thus, we would want to introduce Case=Gen, Case=Ine, ... equivalents as well, correct? At present, I would simply add the “Case” feature to the ADP. Since UD_Sindhi-Isra is invisible to me, I would like an example of what the Type=Gen is referencing.

@Stormur
Copy link

Stormur commented Dec 9, 2024

Hi!

I think that AdpType needs to be ignored. It is a contextual tag which totally depends on the the linear order: it does not add any information and the distribution of an adposition (which can be both pre- and post- in different circumstances) is easily retrievable from data. A possible example of circumposition is um Gottes willen 'for the will of God' in German, according to interpretations (willen is actually a noun).

For Voc, if you are referring to some element more or less obligatorily introducing a vocative element (like o in Latin), I think it is best treated as a PART, with a possible PartType. But yes, one cannot really speak of case there (unless there is a really specific form for vocatives).

@dan-zeman
Copy link
Member

dan-zeman commented Dec 9, 2024

I agree with @Stormur that the distinction between Prep, Post, and Circ is redundant. The feature exists merely because the distinction was present in some tagsets that were converted to UD.

AdpType=Voc is different (and it has nothing to do with the vocative case). It distinguishes special vocalized forms of Czech prepositions from their base forms (the short base form is the lemma for both). For example, "in/at/on" is normally v, as in v pondělí "on Monday", but in some contexts (conditioned phonologically), the form must be ve, as in ve středu "on Wednesday".

@muteeurahman
Copy link

muteeurahman commented Dec 9, 2024

@rueter At least in Sindhi and Urdu postpositions are common and prepositons are rare. There are examples where pre and post can appear together with a nominal.
For example in Urdu and Sindhi a sentences (shown below) with identical structure where there is preposition and postpostion both are there around a pronoun
image

In Sindhi (in Urdu as well) postpositions are used as case markers. Genitive / Possessive case markers (postpositions) are further inflected for number, gender, and case. i.e. they not only mark genitive case but they also have nominative, or oblique (in UD terms accusative) cases. See following snapshots from Sindhi dataset.
image
You can see postpositoins at 2, 4, and 9. 4 is genitive postposition with nominative case, masculine gender, and singular number features. For locative postpositon these features are not used.
image

Here at position 13 postpositon has oblique / accusative case.

@rueter
Copy link

rueter commented Dec 9, 2024

Hi!

I think that AdpType needs to be ignored. It is a contextual tag which totally depends on the the linear order: it does not add any information and the distribution of an adposition (which can be both pre- and post- in different circumstances) is easily retrievable from data. A possible example of circumposition is um Gottes willen 'for the will of God' in German, according to interpretations (willen is actually a noun).

Thank you, @Stormur, for the example of the Circ. I am still playing around with the significant difference between Finnish (1) postposition “metsän [keskellä]” ‘[in the middle of] the forest’ with a genitive complement and its counterpart (2) “[keskellä] metsää” ‘[surrounded by] forest’ with a partitive complement.
(1) seems to show a definiteness associated with ‘forest’
(2) ‘lacks this definiteness’

For Voc, if you are referring to some element more or less obligatorily introducing a vocative element (like o in Latin), I think it is best treated as a PART, with a possible PartType. But yes, one cannot really speak of case there (unless there is a really specific form for vocatives).

Is Latin o PART equivalent to the English hey INTJ?

@rueter
Copy link

rueter commented Dec 9, 2024

AdpType=Voc is different (and it has nothing to do with the vocative case). It distinguishes special vocalized forms of Czech prepositions from their base forms (the short base form is the lemma for both). For example, "in/at/on" is normally v, as in v pondělí "on Monday", but in some contexts (conditioned phonologically), the form must be ve, as in ve středu "on Wednesday".

Thanks @dan-zeman, this looks like it would be approximately the same as a possible “DetType=Voc” for distinguishing a in a cow, a horse and an apple and an horse.
Here the question would be whether the value Voc should have a common denominator for both ADP and DET ?

@rueter
Copy link

rueter commented Dec 10, 2024

I agree with @Stormur that the distinction between Prep, Post, and Circ is redundant. The feature exists merely because the distinction was present in some tagsets that were converted to UD.

Thanks for giving me a shake. I thought about redundance, and it occurred to me that Circ might require a parallel to cc:preconj vs cc, since one might otherwise have a problem identifying which element was which in the German um Gottes willen.
case:prep, case:post.
um Gottes Willen, of course, would understroke the NOUN Willen.

We might need to alter the description of case.
(1) Circ should also be mentioned with preposition, postposition and clitic.
(2) examples might be problematic. Is up or down an ADP? *case(lookout, up), *case(lookout, down).

However, if various combinations of prepositions can be used to express different meaning combinations or nuances, then each preposition is independently analyzed as a case dependent. Examples of this in English include up beside (which can alternate with down beside or up near) or except during which can alternate with as during or except after:

It would seem that that the ADV up and down are providing us with deictic information.
Should these be fixed(up, beside) and attach as case(lookout, up)? What do you think @amir-zeldes and @nschneid ?

@Stormur
Copy link

Stormur commented Dec 10, 2024

For Voc, if you are referring to some element more or less obligatorily introducing a vocative element (like o in Latin), I think it is best treated as a PART, with a possible PartType. But yes, one cannot really speak of case there (unless there is a really specific form for vocatives).

Is Latin o PART equivalent to the English hey INTJ?

Hmmm, now that you say it... I am not sure about hey, because personally I tend to interpret it as hey, Jude, i.e. as 2 separate elements, first an exclamation and then a vocative. More or less like hi (independently from the presence of the comma). While o really does not appear elsewhere and it seems more bound to the vocative, like Arabic (obligatory as far as I know, though) .

@Stormur
Copy link

Stormur commented Dec 10, 2024

(2) examples might be problematic. Is up or down an ADP? *case(lookout, up), *case(lookout, down).

This opens a Pandora's box for which I tried to give some solutions in my paper on ADVs, but we are very distant from any consensus or even coherent treatment.

@nschneid
Copy link

@rueter The excerpt you included is from https://universaldependencies.org/u/dep/case.html. I don't think this statement about "up beside" actually reflects what is in the treebanks (or perhaps there's gray area). A number of issues with "prepositiony" constructions in English lack a clear and consistent resolution, e.g. https://github.com/UniversalDependencies/docs#795. So I would not rely too heavily on English in developing criteria for adpositions in Sindhi.

Should these be fixed(up, beside) and attach as case(lookout, up)?

"up beside": My gut feeling is that "up" should attach to the noun as advmod not case. Definitely not fixed because the combination is productive: {up, down, over, out} + {by, past, along, beside, between, under, ...}

"case(lookout, up)": Do you mean case(look, up)? In "look up", "up" attaches to a verb not a noun so definitely not case.

@rueter
Copy link

rueter commented Dec 10, 2024

Hi, @nschneid and @Stormur!
Thanks for your feedback!
Looking at the terminology offered at https://glossary.sil.org/term/boundedness, it appears that spatial deixis might offer one possible solution Boundedness=Yes/No.

The approach would probably be to only use Boundedness=Yes where there is a distinction.
Definition: Bounded deixis is place deixis that has a component of meaning indicative of a border.
The examples are out there, in here vs there, here.

Should this discussion be moved elsewhere. As @nschneid said, the definition in u/dep/case.html is not necessarily representative of the English treebanks.

@nschneid
Copy link

A general proposal for spatial deixis should be discussed at https://github.com/UniversalDependencies/docs/issues/

@dan-zeman
Copy link
Member

Should this discussion be moved elsewhere. As @nschneid said, the definition in u/dep/case.html is not necessarily representative of the English treebanks.

It should be moved also because this thread is not about English but Sindhi :-)

@AngledLuffa
Copy link
Author

@dan-zeman thank you, the intent of this issue was to document needed changes in the Sindhi treebank, hopefully on a trajectory for May release, not to relitigate all of UD's treatment of ADP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants