Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic lyrics support #354

Closed
adrianholovaty opened this issue Sep 11, 2024 · 14 comments
Closed

Basic lyrics support #354

adrianholovaty opened this issue Sep 11, 2024 · 14 comments

Comments

@adrianholovaty
Copy link
Contributor

Just to get a discussion going, here's a rough first draft of what lyrics support in MNX might look like.

Obviously there's a lot to lyrics — formatting, language encoding, extender (melisma) lines, non-word sounds such as MusicXML's <laughing> and <humming>. I'd like to design this in a way that gets us a solid foundation that we can elegantly extend later, rather than getting forever bogged down in obscure features.

For prior art, see MusicXML's approach to lyrics:
https://w3c.github.io/musicxml/musicxml-reference/elements/lyric/

My approach here is to sketch out a simple example, then provide some thoughts on how it could be extended to support more obscure aspects of lyrics.

Proposed simple example

Screenshot 2024-09-11 at 8 36 48 PM
[
  {
    "type": "event",
    "duration": {"base": "quarter"},
    "lyrics": {
      "lines": [
        {"text": "Are"}
      ]
    },
    "notes": [
       {"pitch": {"octave": 5, "step": "C"}}
    ]
  },
  {
    "type": "event",
    "duration": {"base": "quarter"},
    "lyrics": {
      "lines": [
        {"text": "you"}
      ]
    },
    "notes": [
       {"pitch": {"octave": 5, "step": "D"}}
    ]
  },
  {
    "type": "event",
    "duration": {"base": "quarter"},
    "lyrics": {
      "lines": [
        {"text": "sleep", "hyphen": "start"}
      ]
    },
    "notes": [
       {"pitch": {"octave": 5, "step": "E"}}
    ]
  },
  {
    "type": "event",
    "duration": {"base": "quarter"},
    "lyrics": {
      "lines": [
        {"text": "ing?", "hyphen": "end"}
      ]
    },
    "notes": [
       {"pitch": {"octave": 5, "step": "C"}}
    ]
  }
]

A pretty basic starting point. Lyrics would live on events. Each event would have an optional "lyrics" key, which would be a lyrics object. Its "lines" key would be an array of lyric syllable objects. Each lyric syllable is required to have a "text". (Eventually I could envision loosening this: each lyric syllable would be required to have a "text" or something like "laugh": true.)

Hyphens

The hyphen between "sleep" and "ing?" in that example is encoded as "hyphen": "start" and "hyphen": "end" on the two lyrics. This is inspired by MusicXML's syllabic element and syllabic data type. The default (if not provided) would be "single".

Multi-line lyrics

Multi-line lyrics would be supported via multiple objects in a single event's "lines" array.

Screenshot 2024-09-11 at 9 10 01 PM
[
  {
    "type": "event",
    "duration": {"base": "quarter"},
    "lyrics": {
      "lines": [
        {"text": "Are"},
        {"text": "Am"},
      ]
    },
    "notes": [
       {"pitch": {"octave": 5, "step": "C"}}
    ]
  },
  {
    "type": "event",
    "duration": {"base": "quarter"},
    "lyrics": {
      "lines": [
        {"text": "you"},
        {"text": "I"}
      ]
    },
    "notes": [
       {"pitch": {"octave": 5, "step": "D"}}
    ]
  },

There would need to be a way for a given event's lyric lines to "skip" a position. For example, see this old engraving of "My Way," in which there are three lines, then only the top line, then only lines 2/3:

Screenshot 2024-09-11 at 9 19 52 PM

Pretext for lyric syllables

A common practice is to display a verse number directly within the lyrics, such as the "2." here:

Screenshot 2024-09-11 at 9 28 34 PM

This could be encoded on the lyric line object as "pretext" (better name likely needed!).

{"text": "I", "pretext": "2."}

Obviously it's possible to just encode this as {"text": "2. I"}, but it's much nicer to split out the pretext. That gives the consuming application more possibilities in lyric display. Plus the engraving is subtly different: the text should be centered under the note, as if the pretext doesn't exist.

Another example:

Screenshot 2024-09-11 at 9 39 21 PM
[
{"text": "be", "pretext": "(SALLY:)"},
{"text": "see", "pretext": "(BEN:)"}
]

Lyrics above the staff vs. below

Sometimes a single staff (multivoice, but not necessarily) contains lyrics both above and below the staff:

Screenshot 2024-09-11 at 9 23 44 PM

This could be encoded on the lyric line object:

{"text": "You", "placement": "above", "pretext": "SUE:"}

Perhaps we could also support setting the default placement on a sequence — that way it's only set in a single place.

We'll need to support multiline lyrics above the staff as well.

Non-lyric text that gets rendered with lyrics

It's also reasonably common in pop music to engrave non-lyric text as one of the lyric lines. It's more of an instruction, but it's important to engrave it in the lyrics area. For example:

Screenshot 2024-09-11 at 9 31 49 PM

This could be encoded as a lyric line with an empty "text" (?) and some other object, perhaps called "aside". There's also a need to specify "though this text is rendered in the lyrics space, the default lyrics spacing algorithm does not apply to it" — for which I've used "detached": true here.

[
  {
    "text": "Did",
    "pretext": "1."
  },
  {
    "text": "",
    "pretext": "2. 3.",
    "aside": "(see additional lyrics)",
    "detached": true
  }
]

Other data in the lyrics object

Apart from "lines", the lyrics object would contain data about the event's lyrics in general (that is, not referring to any of the specific lines). One thing that comes to mind is encoding a bracket before a particular event's lyrics, like this:

Screenshot 2024-09-11 at 9 41 18 PM

This could be encoded as "startBracket", with an optional size (measured in number of lyric lines?):

{
  "lyrics": {
    "startBracket": {
      "size": 2
    },
    "lines": [
      {"text": "your"},
      {"text": "my"}
    ]
  }
}

Formatting

Finally, a quick thought on formatting. I think font choices shouldn't live at this low level, as that would lead to a lot of duplication. There should be a way to set the lyric font for the document in a single place, perhaps in a layout.

@mscuthbert
Copy link
Contributor

Wow! This is beyond "Basic" lyrics support. Super well thought through!

Quick-thought comments:

hyphen: can we keep something like the musicxml syllabic or rename to wordPart? I think that hyphen should be reserved specifically to preserve information about the hyphen itself (should it be shown? repeated over melismas? etc.) -- something to discuss later (semantic vs. presentation, etc.).

pretext: cool idea! I do think we need to distinguish between the 2. I graduated from pretext case and the (SALLY:) pretext case. Why is there a 2. w/ the I of I graduated from? Easy: it indicates the start of a new verse. But why is there a (SALLY:) before be_____? Because it's the start of a new system or maybe the start of a new page. If we moved the previous measure forward or this measure backwards, presumably the position of (SALLY:) would move with it. Therefore I think that we need something like a pretextRegion concept where the first syllable of the region would get SALLY: and any future syllable that (depending on how configured) began a system or page would get (SALLY:) [scrolling applications too, etc.] until the end of the pretextRegion.

aside and detached are super helpful and something missing in MusicXML. I think that this also would go well with anything relating to the successors to <humming> and <laughing> (add cough, tongue-click,...) so that the text can display as an aside: "mmmmm" but not have any vocal-processing VST try to find sounds for m.

I don't think that we'll get away with omitting a number or similar tag (NMTOKEN in MusicXML) and having it be a list would be ideal. Look at encoding the Sal-ly rest-ing in {your/my} how does the processor know that Sally resting in belongs to both the your and the my lines? (And from a semantic perspective, it would be good to know that so that the layout is shifted to the middle for semantic reasons not merely as layout.

The number or similar tag, should probably not be called number but more generic like lyricId or lyricGroup -- Looking at the examples above there is a difference between the 1. First/2. Second lyrics and the (SALLY:) be/(BEN:) see examples. The 1./2. lyrics should not be sung at the same time, while the SALLY/BEN lyrics should probably both be sung at the same time even if there are no repeats. (a representation system that doesn't allow multiple performers on a single staff would probably insist that these be encoded as separate staves, but I hate the idea that we have to encode the same notes (and possibly same lyrics) twice when they should only be displayed once).

I am so glad that JSON will allow the time-only/timeOnly specification to just be an array of ints rather than a regex.

@samuelbradshaw
Copy link

samuelbradshaw commented Sep 12, 2024

pretext: I like this concept, but I don't like the name (because "pretext" is a word with an unrelated meaning). How about something like "label" (I think this is what MEI uses)?

pretext feels similar semantically to name and shortName on a part. Should parts and lyrics follow similar patterns for a label/prefix/name/abbreviation?

Do both start and end need to be specified? Would it be simpler to just have a boolean flag on every syllable that indicates whether there's more of the word ahead (false if not specified)? Something like this (though I'm not confident on the property name):
{"text": "Are"} … {"text": "you"} … {"text": "sleep", "continues": true} … {"text": "ing?"}

I agree with @mscuthbert some way to tie a set of syllables together as being part of given verse(s) or group(s) is needed. For me, a use case would be showing and hiding verses in a hymn with several verses, possibly with a chorus that would always be visible.

Should asides always be detached (if so, is the detached property needed)?

Do we need to consider a single line of lyrics tied to more than one part?

Do you think this will be flexible enough to cleanly handle multiple syllables on a single note, or multiple notes for a single syllable? Or the first verse with two syllables on a note and the second verse with only one syllable on a note?

@mscuthbert
Copy link
Contributor

Should asides always be detached (if so, is the detached property needed)?

The detached property may be needed for when an engraver does not mind that a syllable overlap with the next note. There are times when a syllable is long (e.g. "sixths") but the next note or two doesn't have a lyric, and overlap is acceptable. (Or where the next note only has a syllable in another verse). Maybe it's a three-way: inherit, yes, no.
Screenshot 2024-09-11 at 14 55 47

My feeling with MNX is that since we are not optimizing for ease of human writing of json is keep them separate. There are two separate concepts (aside: is it something in the "lyrics" space that is not to be sung; detached: should this lyric not affect layout) that happen to be highly correlated.

I had some of the same misgivings about pretext but label could be too generic and doesn't specify the placement. What about before to align with the CSS naming?

@samuelbradshaw
Copy link

samuelbradshaw commented Sep 12, 2024

Do you think this will be flexible enough to cleanly handle multiple syllables on a single note, or multiple notes for a single syllable? Or the first verse with two syllables on a note and the second verse with only one syllable on a note?

For multiple syllables in a note, I guess you could just put the whole thing in the text property:
{"text": "Are you"} … {"text": "sleep", "continues": true} … {"text": "ing?"}
{"text": "Are"} … {"text": "you"} … {"text": "sleeping?"}
{"text": "¿Qué ̮es"} … {"text": "lo"} … {"text": "que"}

For multiple notes on a syllable:
{"text": "Are"} … {"text": "you"} … {"text": "slee", "continues": true} … {"continues": true} … {"text": "ping?"}

@lemzwerg
Copy link

lemzwerg commented Sep 12, 2024

I like the 'continues' idea since there are languages like Chinese that have multi-syllable words but no hyphens inbetween (if written with CJK characters).

@lemzwerg
Copy link

lemzwerg commented Sep 12, 2024

There might also be dense typesetting situations where hyphens are not shown at all. However, sometimes a hyphen should be enforced even in such dense typesetting, and for such situation there should be an explicit 'hyphen' object.

In other words, I consider using 'continues' as a helpful means to separate content from layout.

@cyrilcoutelier
Copy link

cyrilcoutelier commented Sep 12, 2024 via email

@williamclocksin
Copy link

williamclocksin commented Sep 12, 2024 via email

@adrianholovaty
Copy link
Contributor Author

Responding to @samuelbradshaw:

Do both start and end need to be specified? Would it be simpler to just have a boolean flag on every syllable that indicates whether there's more of the word ahead (false if not specified)? Something like this (though I'm not confident on the property name): {"text": "Are"} … {"text": "you"} … {"text": "sleep", "continues": true} … {"text": "ing?"}

I agree that would be simpler and perhaps better. In fact that's how our own internal format at Soundslice works — a lyrics syllable simply has a boolean that answers the question "Is this syllable meant to continue into the next one?" That's worked well enough for us for 7+ years of lyrics support. And we support non-trivial stuff like lyrics-only view.

I don't immediately see the need for "end" — i.e., knowing whether a syllable is meant to finish the previous syllable. It introduces an opportunity for error and ambiguity (if the first syllable has "start" but the second syllable lacks "end"), and I don't know what the benefit is. Anybody have a clear use case for encoding both the "start" and "end" as opposed to just the "start"?

@mscuthbert
Copy link
Contributor

Chairs meeting noted that we need a way of encoding line-break (or start-of-line) for being able to extract lyrics to poetic blocks. (e.g., https://www.soundslice.com/slices/bwZcc/ -- then click microphone at bottom right)

@samuelbradshaw
Copy link

samuelbradshaw commented Sep 12, 2024

I would very much like improvements that make it easier to extract lyrics into words, phrases, and poetic blocks. Some of the challenges I'm aware of currently are:

  • Knowing where a lyric phrase starts or ends (new line)
  • Knowing where a lyric block (verse / chorus / etc.) starts or ends (double new line, or new paragraph)
  • Knowing where to put spaces (in English you can assume spaces between words, but that's not true in all languages)
  • Knowing where to put hyphens (the music has hyphens between every syllable, so it's tempting to strip them all out for a lyrics view, but some words are hyphenated as part of their spelling, in English and other languages)
  • Handling choruses that only appear in the music once, but should be repeated after every verse in a lyrics view
  • Following repeats and jumps generally (you can't just extract all of the syllables in the order they appear in the music)
  • Handling places where different voices sing different lyrics simultaneously

If the new line is just part of the text, I think the standard way to encode that in JSON is with two slashes: \\n (see https://stackoverflow.com/a/42073). But maybe there's a better way to semantically indicate phrase breaks and block breaks that doesn't involve new-line characters.

@adrianholovaty
Copy link
Contributor Author

I've put pull request #355 together, which adds basic lyrics support to the MNX spec. I just wanted to get the "bones" of it in — lyric lines and hyphens — as opposed to dealing with the various special cases discussed in this thread. The most important thing is that the basic structure of the encoding will be forwards-compatible with the various special cases. Feedback welcome!

@williamclocksin
Copy link

williamclocksin commented Sep 27, 2024 via email

@adrianholovaty
Copy link
Contributor Author

All right, as of 230dab7 we now have basic lyrics support! I'm closing this issue to keep things focused. We'll open separate issues for the various other lyric cases, as @samuelbradshaw has done with #357.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants