Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with annotation of "sadece" in UD_Turkish-BOUN #1016

Open
jonorthwash opened this issue Feb 28, 2024 · 10 comments
Open

problem with annotation of "sadece" in UD_Turkish-BOUN #1016

jonorthwash opened this issue Feb 28, 2024 · 10 comments

Comments

@jonorthwash
Copy link
Contributor

In UD_Turkish-BOUN, the word sadece "only" is often treated as a dependent on a verb, even when it's serving to restrict something else.

For example:

# sent_id = news_1874
# text = Başka bir krize neden olan MGK toplantısı ise sadece 30 dakika sürmüştü.
1	Başka	başka	ADJ	Adj	_	3	amod	_	_
2	bir	bir	DET	Indef	_	3	det	_	_
3	krize	kriz	PRON	Pers	Case=Dat|Number=Sing|Person=3	4	obl	_	_
4	neden	neden	NOUN	_	Case=Nom|Number=Sing|Person=3	7	acl	_	_
5	olan	ol	VERB	Ptcp	Polarity=Pos|Tense=Pres|VerbForm=Part	4	compound:lvc	_	_
6	MGK	MGK	PROPN	Abr	Case=Nom|Number=Sing|Person=3	7	nmod:poss	_	_
7	toplantısı	toplantı	NOUN	_	Case=Nom|Number=Sing|Number[psor]=Sing|Person=3|Person[psor]=3	12	nsubj	_	_
8	ise	i	AUX	Conj	Mood=Cnd|Number=Sing|Person=3|Polarity=Pos	7	cop	_	_
9	sadece	sadece	ADV	_	_	12	advmod	_	_
10	30	30	NUM	NNum	Case=Nom|Number=Sing|NumType=Card|Person=3	11	nummod	_	_
11	dakika	dakika	NOUN	_	Case=Nom|Number=Sing|Person=3	12	obl:tmod	_	_
12-13	sürmüştü	_	_	_	_	_	_	_	SpaceAfter=No
12	sürmüş	sür	VERB	Ptcp	Aspect=Imp|Number=Sing|Person=3|Polarity=Pos|VerbForm=Part	0	root	_	_
13		y	AUX	Zero	Aspect=Perf|Evident=Fh|Number=Sing|Person=3|Tense=Past	12	cop	_	_
14	.	.	PUNCT	Stop	_	12	punct	_	SpacesAfter=\n

Here sadece 30 dakika sürmüştü means something like "continued for only 30 minutes". Surely sadece should be dependent on 30, and not the verb?

But that raises what kind of relation it should be. It's not quite an advmod because it can be a dependent on pretty much any part of speech (like only and just in English). This seems to me to be what advmod:emph is for?

@jonorthwash jonorthwash changed the title problem with annotation of _sadece_ in UD_Turkish-BOUN problem with annotation of "sadece" in UD_Turkish-BOUN Feb 28, 2024
@nschneid
Copy link
Contributor

FTR, only and just are advmod in English, which doesn't use advmod:emph.

There are language-specific documentation pages for advmod:emph in Turkish and Tatar. I don't know if that reflects all treebanks in Turkish or just some of them.

@jonorthwash
Copy link
Contributor Author

@iambusra, do you have thoughts on this?

@jonorthwash
Copy link
Contributor Author

FTR, only and just are advmod in English, which doesn't use advmod:emph.

Does this pass validation, when you have an advmod dependent on a noun or a number or an adposition?

@nschneid
Copy link
Contributor

nschneid commented Mar 1, 2024

Yep, the validator does not look at subtypes except a couple of the semi-mandatory ones, and advmod dependents of non-predicate non-modifier words is acknowledged in the guidelines: https://universaldependencies.org/u/dep/advmod.html

@jonorthwash
Copy link
Contributor Author

I guess my question was more about advmod modifying anything. So, for example, what about modifer words with advmod dependencies—e.g., in only this time, would only be an advmod dependency of this (a determiner), or it dealt with differently?

@nschneid
Copy link
Contributor

I would make it a dependent of time. There is a policy (that I only noticed recently) distinguishing "pure function words" that do not accept modifiers other than negation: these include case markers/adpositions and articles, and potentially other words. For English I would assume that demonstrative determiners should be considered pure function words. (As opposed to demonstrative pronouns, which can take relative clause modifiers, for instance.)

Looking at treebanks there are some quantifier determiners with modifiers like "yet another", "nearly all", "almost every". But I don't see any demonstrative determiners with modifiers.

@Stormur
Copy link
Contributor

Stormur commented Mar 14, 2024

I guess my question was more about advmod modifying anything. So, for example, what about modifer words with advmod dependencies—e.g., in only this time, would only be an advmod dependency of this (a determiner), or it dealt with differently?

If something is an ADV, it is required to depend as advmod, also in this case. I think that using advmod:emph is a useful distinction, which also points to the fact that this is not really a "usual adverb". In general, if we annotate sadece or else as an ADV, there are no problems in making it depend as advmod, on whichever other part of speech, and this is probably the best solution (instead e.g. of identifying a DET sadece for noun phrases vs. an ADV sadece for predicates).

Since it is annotated as a function word, I also agree that its relation is with the head of the noun phrase, so dakika or time in current examples. But it might also be otherwise: in something like nearly all persons, I would propend towards advmod:emph(all,nearly). I do not think that determiners are "pure function words", which seems to refer to the multifarious class represented in UD by CCONJ/SCONJ/ADP/PART.

@jonorthwash
Copy link
Contributor Author

jonorthwash commented Mar 16, 2024

, there are no problems in making it depend as advmod

The question wasn't about the deprel but the head.

in something like nearly all persons, I would propend towards advmod:emph(all,nearly).

This is my inclination as well, but @nschneid suggests this isn't attested in (English?) treebanks.

Is there at least consensus that this is a reasonable annotation of such structures?

Since it is annotated as a function word, I also agree that its relation is with the head of the noun phrase, so dakika or time in current examples.

I don't fully get why this should be different from nearly all persons.

@nschneid
Copy link
Contributor

The pattern I am seeing in English treebanks is that prenominal "nearly all" is treated as a unit, but not prenominal "only this". This may have to do with the fact that "only" is a focusing modifier that applies to nominals in general, whereas "nearly" is a degree modifier of a quantity.

  • Only this book needs to be returned.

  • Only books that are overdue need to be returned.

  • Only my aunt's casserole was not finished in its entirety.

  • Nearly all books need to be returned.

  • Nearly 30% of student borrowers will see some relief with this policy.

  • *Nearly books that are overdue need to be returned.

But then we have to deal with the less common case where the modified quantity is not a det or nummod:

  • Nearly the entire population of student borrowers will be helped by this policy.
  • "nearly everyone"
  • "nearly a quarter century"

The quoted ones are both treebank examples, and "nearly" modifies the head of the nominal ("everyone", "century"). So it is not a categorical rule that "nearly" never modifies the head nominal.

At the end of the day I don't think English UD has all the answers on determiners and determiner-related dependents in nominals, especially when it comes to various constructions for expressing quantity and measurement. So, take this precedent with a grain of salt.

@Stormur
Copy link
Contributor

Stormur commented Mar 21, 2024

, there are no problems in making it depend as advmod

The question wasn't about the deprel but the head.

I was referring to this in the original post:

But that raises what kind of relation it should be. It's not quite an advmod because it can be a dependent on pretty much any part of speech (like only and just in English).

With regard to the head

Since it is annotated as a function word, I also agree that its relation is with the head of the noun phrase, so dakika or time in current examples.

I don't fully get why this should be different from nearly all persons.

I would also give an explanation like the one by @nschneid , but the more I think about it, the less obvious it is 🤔
Probably they interact with the other modifiers: degree modifiers can modify others, and focalisers can focus anything.

But it might as well be that I was making a blunder. Looking at these examples

  • Nearly the entire population of student borrowers will be helped by this policy.
  • "nearly everyone"
  • "nearly a quarter century"

the position of nearly "outside" the determiner seems to point to it modifying the whole phrase, and so attaching to the head. Now we might talk about how much lexical or functional terms like entire (maybe PronType=Tot?) or quarter (a numeral) are, but the whole phrase is to be considered "scaled". This happens also for focalisers, as in only this book...

The reason for not being able of saying nearly books is that this expression is not quantified somehow. But I would no longer say that this means that if a quantifier is present, nearly attaches to it.

So personally I would correct my previous statement and lean towards attaching elements like nearly, only, etc. always to the head. In some cases, this might indirectly lead to reconsidering annotation of some elements (entire, quarter...)

@dan-zeman dan-zeman modified the milestones: v2.14, v2.15 May 15, 2024
@dan-zeman dan-zeman modified the milestones: v2.15, v2.16 Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants