-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to annotate Sabrina Carpenter's "Espresso" #1070
Comments
We could ExtPos=VERB it. :) |
"brand-newed" is the harder case as the morphological derivation is complex—[[brand - new]ed]. The inflection of a complex expression breaks UD's lexicalist assumption. Probably the easiest solution is just to treat it as a single-word VERB. |
That could work! It does break the latest tokenization standard to single-word that particular phrase, though. One final example from the text:
Single token again for the analysis? |
Yeah I suppose. It is not too different from current morphological uses of hyphens, e.g. "non-human" and "over-eater", where the hyphen is not tokenized separately. |
I would tokenize it apart, I think that matches English tokenization standards better. Then I would analyze it as a compound verb, so compound(newed, brand) and the head as a regular verb. |
Yeah that's another option (I forgot we used plain |
Yeah, we have a few of these in GUM also without hyphens (tape record, guest star) |
Although it could be argued that |
May I ask again why does this break the lexicalist assumption? I fear I am constantly confused by it at this point. For similar cases I wonder if the "overarching" annotation of such complex phrases could not be put in what are now empty "range tokens", while keeping the internal structure of the elements. I do not see other ways to treat such cases in a fully satisfying way, though this apparently breaks some tenets of UD annotation and probably introduces some kinds of constituent nodes and messing with the tree structure... So: I know I Mountain Dew it for ya
The PROPN would actually be some intermediate annotation level. One touch and I brand-newed it for ya
Not only English does this of course. I can think of a Latin adjective like aequinoctialis, with a further level of nesting... Like, UD v4 or 5, not even 3 👀 Or maybe just "early" morning ramblings, sorry. |
This goes against the idea of MWT. Multi-word token is a single orthographic token written without spaces.* Here we have clearly two orthographic tokens "Mountain" and "Dew". MWTs are not general "range tokens" for any purpose when we would need to annotate phrase-level phenomena. *) The only exception may be languages like Vietnamese where spaces inside words are allowed. There I could imagine a MWT with spaces, but only if some of the words within the MWT are not separated by a space. (There are no MWTs in current Vietnamese treebanks in UD.) As for the main topic of this issue, I don't have strong opinions. I like the In general, I don't think UD should have guidelines for phenomena that are less frequent that say 0.1%. Someone can spend a second on inventing a new pun (e.g. "Mount ain't Dew") and we would spend hours and months with GitHub discussions about it. |
As I stated, it is clear to me that something like this at the moment breaks lots of definitions and standards of the current format. But I still also think that something like this should deserve serious discussion because I do not really see how to solve some issues. By the way, I understand why Vietnamese is an "exception", but as all exceptions, when you look closer at it, you notice that very similar things are happening also in other languages, maybe not so systematically. So there is probably no reason to confine this "exception" to Vietnamese only.
I am a little skeptical about seeing
Here I strongly disagree, since the example at hand might have a humorous connotation, but it represents something very general that we observe not so infrequently (i.e. whole phrases molded into a unitary element), more like 10% rather than 0,1%. It is just more overt in this case than it usually is. Anyway, puns also happen on linguistic basis, so we have to be able to address them. |
Yes, the order of operations in terms of morphosyntactic bracketing is
But while we can split an adverbial modifier and tack it onto the "do so", we can't do that with compound verbs:
Of course, "brand" fails in the same way, but that's not surprising given its derivation. |
The point I was trying to make about lexicalism was just that UD doesn't really account for morphological compounding and derivation. There are attempts within UniDive to add more structure for this. For brand-newed, overloading the syntactic In the case of Mountain Dew used as a verb, I think ExtPos=VERB is the most obvious solution for now, though ideally there would also be verbal features. Perhaps UDv3 will offer more flexibility for annotating phrasal morphology. |
OK. To put it briefly, I think we have enough evidence now to say that this distinction does not exist. Or at least, since UD covers the annotation from morphology to syntax, we should be able to integrate these cases properly. As far as I know the efforts in UniDive go more in the direction of lexicon, so a "higher" or "more external" layer. I still do not understand the usefulness of |
ExtPos=VERB explains why it can take an object. (1) |
TBH I'd be more inclined to just tag it as a verb to begin with - that's what we do with conversion in general, both with established cases ("mail/VERB something to someone") and with neologisms ("I can Google/VERB it"). Why treat this case differently? |
Because it is a multiword expression acting as a verb. Unless there's an
interpretation I'm not considering, "Dew" is not a verb without "Mountain".
In the same way that "Kiss Me, Kate" is internally a clause but acts as a
proper name, hence ExtPos=PROPN.
|
Oh, I see! I thought because it's word-play we were interpreting "Dew" as standing in for "Do". I guess if you use compound in the nouny sense and say first it's a noun compound (regular Mountain Dew), and only then converted to a verb, then yes, ExtPos makes sense. |
I'm pretty sure it's not meant to be wordplay on "do", but just a reference to the energy drink soda and making things more exciting / bubbly / whatever else positive people associate with Mountain Dew |
I think it's both, no? I mean, the line is:
Isn't this a play on "I know I do it for ya"? |
Time to write a paper on UD annotation of syntactic puns. :) |
It is well known that English is still understandable if you verb nouns or adjectives. Presumably in a sentence such as the previous one, a correct tag analysis would be
verb_VERB
with all of the associated dependencies matching that analysis.What would be the analysis of text such as in Sabrina Carpenter's "Espresso"? In this song, she verbs two or more words at a time. For example:
My understanding is that
brand-newed
would be tokenized as three words anyway under the current punctuation guidelines, meaning we can't get away with a single token analysis of the second sentence.The
PROPN
used as a verb also adds complexity to the situation. Would both words be tagged asVERB
or just one, with the other tagged asPROPN
?The text was updated successfully, but these errors were encountered: