-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are we comfortable with the guidelines for modification of function words? #991
Comments
There was a long and dynamic discussion about this when UD v1 was being drafted (end of September 2014). Only nine years passed and we have it back :-) In those ancient times we still used e-mail to discuss the guidelines. I'm not sure it would help to copy all the e-mails here but I find at least this contribution from Stanford (@ngiordani) interesting: Below is the conclusion Chris and arrived at today, after discussing what has come up in this thread, as well as additional English Web Treebank (EWT) data that we've talked about within our group. Here are two crucial examples that I think represent the class of constructions we're focused on: right on time Historically: in the EWT annotation, we made right and hours dependents of the respective prepositions. However, as Joakim pointed out elsewhere, in dependency syntax there is always an ambiguity between head modification and phrase modification, and the annotation we produced is ambiguous in that respect. While I agree that there's an intuition that right modifies on, it seems perfectly plausible to say that it modifies on time; note that it's also possible to say right then to extend the argument that Joakim has made before. (It's interesting that this works with right, which can't modify other time adverbs.) I also could not come up with a single diagnosis that would distinguish modifying the preposition in these cases from modifying the prepositional phrase. (If anyone has an idea, please share!) So there doesn't seem to be linguistic evidence (in English at least) for this P-attachment analysis. Additionally, allowing prepositions to take dependents hurts the parallel we're trying to draw with case markers. And finally, this is going to create a problem (in fact, it already creates a problem) for the collapsed representation, which a lot of people use. In that representation, any modifiers of a preposition will have to be moved to depend on its complement anyway. For these reasons, both Chris and I feel like case-typed prepositions should not have adverbial modifiers, and modifiers such as right and two hours in the example above should attach to the nominal head, representing phrase-level modification. This is consistent with attachment decisions in the rest of the scheme. HOWEVER, we think there's a class of examples that should treated differently. Consider: two hours after they left In cases like this, we're worried about usability; attaching the adverb to the verbal head would be an analysis that's very difficult to interpret. Again, it's difficult to argue P^0-attachment vs. PP-attachment. But the problem of keeping this parallel to case markers isn't in issue, because in English we would annotate this after as mark, not case (since it takes a verbal complement). So basically we'd like to allow A unified treatment would of course be desirable, but at the end of the day, it might not even be possible. It's very difficult to propose an analysis in which English prepositions, which can take verbal complements, also share properties with case markers from other languages. We think this solution is a good compromise. Natalia |
Here are the modified SCONJ cases. Honestly, it seems like trying to have it both ways—if In English, it is hard to draw a sharp distinction between ADP and SCONJ, but we do so in UD based on the function of case vs. mark, which is based on the category of the head. But parallels like "two hours after NP" / "two hours after VP" show how similar they are. It seems like this guideline is imposing yet another awkward structural distinction (and one that is too rare for most annotators to learn as a special case). I wonder if there is a universal claim to be made, which is that "true" case markers (such as clitics attaching as |
The strong similarities between ADP and SCONJ may be peculiar for English -- it does not hold for Swedish, for example, which is otherwise similar to English in many respects -- but I nevertheless agree that it is awkward to treat "case" and "mark" differently in this respect. If we are going to change this, I would definitely prefer to ban dependents of "mark". Trying to draw a distinction between two types of adpositions is likely to open a big can of worms. |
I don't know that I would need to see dependents of mark banned cross-linguistically, there are many languages out there and I don't think we have thought it through. But for English, I don't see any substantial difference between "two hours after the concert" and "two hours after they left". I would attach "two hours" to the lexical head, not to "after", in both cases, following UD's general lexicocentric framework. Looking at GUM, it looks like this is already the case. |
I won't discuss the analysis in UD. My concern is the conversion UD => SUD. In the surface-syntactic analysis of these examples, "after" has two dependents: see the analysis in a native SUD French treebank (I am not sure that it is a good idea to treat the phrase before 'after' as a modifier, but it is our current analysis) and the SUD conversion of a UD native English treebank. |
I'm not sure I ever internalized this policy which says that certain classes of function words only allow negation as a possible kind of modifier, but other classes of function words can have a broader range of modifiers. Based on the examples, it seems "just before" should be
advmod(before, just)
if "before" is an SCONJ but analyzed with theadvmod
andcase
attachments as sisters if it is an ADP. Is this really a good line to draw/are treebanks actually adhering to it? EWT appears to avoid the function-word-modifying analysis for all but a couple tokens.Another question is whether the claim that negation can modify any function word should mean that that is the default interpretation in "not every" and similar. In UniversalDependencies/UD_English-EWT#452, which concerned "not only", @amir-zeldes and I had concluded that the shallower structure was safer.
The text was updated successfully, but these errors were encountered: