Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realizations and Layouts #138

Closed
joeberkovitz opened this issue Jun 29, 2018 · 101 comments
Closed

Realizations and Layouts #138

joeberkovitz opened this issue Jun 29, 2018 · 101 comments
Milestone

Comments

@joeberkovitz
Copy link
Contributor

joeberkovitz commented Jun 29, 2018

This issue is a proposed direction that's intended to address (partly or wholly) the concerns of multiple existing issues including #4, #34, #57, #121.

At the moment I have not proposed any particular XML schema for these ideas, because I wanted to get the ideas themselves out on the table first. I don't think the problem of making up elements, attributes and relationships is going to be too difficult here, though.

Transposition And Its Discontents

For scores that can be consumed as both full scores and instrumental parts,
the question of how to best encode pitch is confounding. Perhaps the two
biggest forces driving MNX architecture are the desire to accurately encode
the author's intentions, and the desire to accurately encode the material to
be read by readers. When it comes to transposition, however, these forces are
not necessarily in alignment. Perhaps the question of "which encoding is
best?" is itself part of the problem, rather than part of the solution.

On the authorial side, a manuscript (whether autograph or digital) may have
been originally notated in either concert or transposed pitch. Thus a
decision to encode only concert pitch, or only transposed pitch, can impose an
unacceptable distance between the encoding and the original material. Recall
that MNX should serve to encode materials that may not have ever passed
through a notation editor, with a reasonable degree of directness. Such
original materials could differ as greatly as an orchestral piece notated in
full at concert pitch, and a clarinet solo notated as a single transposed
part. Should it not be possible to encode either one, such that there is a
direct correspondence between the original manuscript and its encoded pitches?

If so, it seems hard to escape the conclusion that no single pitch encoding
convention serves the goals of MNX well. Some of the many scenarios with pitch
that may occur, include the following:

  • Original full concert score, derived transposed parts
  • Original full transposed score with identically transposed parts
  • Original full transposed score with derived full concert-pitch score
  • Single-instrument transposed part, with no need for a derived full score

It's also true that any algorithmic rule for conversion between pitch levels
will sometimes need to be overridden by skilled editorial judgment. This
doesn't mean that algorithms play no role, but it does mean that an override
mechanism is necessary.

Finally, there is no single choice for pitch encoding that eliminates the need to
convert between different pitch schemes. Implementors will have to deal with
this complexity in at least one direction, and the conversion is basically symmetric in nature: it is not more complicated to go from A to B, then from B to A.

While it has been argued that concert pitch is a "canonical truth" that transcends transposed music, the only canonical truth we really have is whatever the composer wrote down -- which could be in either pitch scheme.

Score And Part Realizations

Looking beyond transposition, we find that parts and scores can differ in
other semantic ways. Some directions or elements (e.g. cue sequences) are only
visible in one or the other. Multi-measure rests may be present in a part and
absent in the score, or vice versa. A textual direction might be shown in a
teacher's guide and omitted from the student edition.

So it seems useful to situate the problem of score/part transposition within a
larger landscape of allowing a CWMN document to vary for different roles. We
therefore propose the new MNX-Common concept of a realization, which says
how the document's contents are to be transformed for consumption by a
particular role (e.g. conductor, performer, student, teacher, etc.). There are
at least two major types of realization: a full-score realization, and part-
specific realizations (one for each part).

Let's start by trying to roughly define a realization, and then look at how
this definition works:

  1. A realization has a list of included parts.
  2. In a given realization, each part transposes its pitches a specified interval from concert pitch.
  3. In a given realization, any measure may override the default key signature with a transposed enharmonic.
  4. In a given realization, any note may override the default spelling with a transposed enharmonic.
  5. Directions and sequences may be restricted to only occur in designated realizations.

There are two built-in kinds of realization, reflecting the main needs of
producers and consumers: score (including all parts), and part (one for
each part in the score).

Note that realizations don't concern style properties or system breaks or
system spacing or credit placement or... all that other visual stuff... That
realm is where layouts come in (see below). For example, a single part
realization might have multiple layouts for different page or display sizes,
each of which has different system breaks and credits positioning.

How Do Realizations Affect Encoding?

Each part specifies a source realization. The source realization of a part
determines how that part's pitches are encoded. Because different realizations
can transpose the same part differently, this means that pitches can be
encoded either in concert or transposed pitch.

Let's look at several widely differing scenarios to illustrate:

In a document derived from a concert-full-score original (or exported from an application which uses
concert pitch as a reference point), we'll have this scenario:

  • The score realization will specify concert pitch for each part (possibly with octave transposition for bass, piccolo, etc.)
  • Each part realization will specify the transposition for its specific part, along with enharmonic overrides.
  • Each part's source realization will be score, thus all notes will be encoded in concert pitch.

In a solo instrument score with a single transposed part as the original (or exported from an application which uses
transposed pitch as a reference point), we'll have this scenario:

  • The sole part realization specifies transposed pitch for that single part.
  • The score realization (if it even exists) is identical to the part realization.
  • The single part's source realization is its part realization, thus all notes will be encoded in transposed pitch.
  • Consequently there do not need to be any enharmonic overrides.

In a document derived from a set of transposed parts we'll have this scenario:

  • The score realization will specify concert pitch for each part. (A full-transposed-score realization could exist also!)
  • Each part realization will specify the transposition for its specific part
  • Each part's source realization will be part and will be encoded in transposed pitch.
  • Each part will include enharmonic overrides for the score realization, as needed to support a presentation at concert pitch.

Transposing With Intervals And Semitone Distances

Like MusicXML, MNX must specify transpositions as diatonic intervals, i.e. as a
combination of steps and semitones. However, as mentioned above realizations may
also supply explicit key signatures and note spellings to override any prevailing
transposition interval.

How Do Realizations Affect Presentation?

When rendering a part for consumption by a reader, a target realization is
used. The problem of figuring out how to spell notes in a given realization is
therefore as follows: how do we transform a note from its source to its target
realization? The rough answer, to be refined in the actual spec, runs something like this:

  • If the two realizations are the same, do nothing
  • If the transpositions for source and target are the same, do nothing
  • If a key signature override exists in the target, use it. Otherwise transpose the source key signature by the intervallic difference between the target transposition and the source transposition.
  • If a note spelling override exists in the target, use it. Otherwise transpose the source note according to the distance in 5ths between the source and target key signatures.

Layouts

In comparison to realizations, layouts are fairly simple. They are ways in which a score may be presented. This is not just about full scores as opposed to parts. For example, a full score realization could itself be customized to be shown on an iPad, printed on A3 paper, or shown in an infinite scrolling strip. Each of these would constitute a distinct layout, based on the score realization of a document.

A layout is characterized by the following:

  1. An underlying realization of the document (typically full-score or part).
  2. Credit/header/footer text with spatial placement relative to display margins.
  3. A stylesheet (i.e. score-wide class and selector definitions). This is useful to control the global appearance of the score (e.g. staff line spacing).
  4. For this specific layout, layout-specific style property overrides can be applied to any element of the score. This capability allows measure style properties for system/page breaks to be scoped to a particular layout, among other things.
  5. An optional display size range, used to automatically select the correct layout based on device characteristics. This would act similarly to CSS Media Queries.
@mdgood
Copy link

mdgood commented Jun 29, 2018

Thanks for this proposal Joe! I have one clarification question and one comment:

  1. Am I understanding correctly that whatever the source realization, what is encoded is the written pitch? That could be transposed for part and transposed score realizations, and concert (with or without octave transpositions) for a concert score realization.

  2. The section on Transposing With Intervals And Semitone Distances badly misrepresents MusicXML's transposition. MusicXML of course uses a combination of chromatic and diatonic steps, not just semitones which wouldn't work. Since MNX has no difference here it seems this section could be cut, but at least it needs to represent MusicXML capabilities accurately.

@joeberkovitz
Copy link
Contributor Author

joeberkovitz commented Jun 29, 2018

@mdgood for [1], that's correct: what is encoded is the written pitch for the source realization, which could use any pitch level (typically either written for the instrument or for concert pitch viewing).

For [2], my apologies I was repeating what I mistakenly thought you had said in #4 (comment). Looking in the schema, I see of course you're right that MusicXML transposition has multiple components. I've corrected the writeup above to fix this mistake.

@cecilios
Copy link

@joeberkovitz Great proposal! Thank you. Let's hope that the evil don't come from the details.

@shoogle
Copy link

shoogle commented Jun 30, 2018

@joeberkovitz, I came to the same conclusion regarding transpositions: while concert pitch makes sense for most new compositions, the "definitive" representation of historical compositions is whatever transposition is used in the available source material.

If MNX forced all scores to be stored in concert pitch it would be problematic for accurately storing historic compositions, and for OMR. However, if MNX forced scores to be in transposed pitch then it would consign itself to only ever being used as an archival and interchange format; I can't see any major score editor choosing to use a transposed representation for their native format. This means both concert pitch and transposed pitch are required.


Might I suggest the following internal representation:

  • Each staff contains:
    • note pitches as written in the source score, or the composers chosen "definitive" representation
    • the transposition of the staff
      • given as a number of diatonic and chromatic steps away from concert pitch
  • Each instrument contains:
    • the transposition of that instrument
      • given as a number of diatonic and chromatic steps away from concert pitch
    • any hints necessary for deciding between key signatures used for display

To calculate the sounding pitch, take the stored pitches and "undo" the staff transposition to get concert pitch.

To calculate the pitch to be displayed to the user in a sheet music application, take the stored pitches, "undo" the staff transposition, then apply the instrument transposition (or just apply the difference between the staff and instrument transpositions).

Benefits:

  • If the notes were written at (for example) a B-flat transposition, and the chosen instrument is also B-flat instrument, then no transposition occurs and notes are simply displayed as written.

  • If notes were originally written (e.g.) for a B-flat instrument, but the user decides to swap this for an instrument with a different transposition (or no transposition), then all that happens is the new instrument's transposition is recorded; the stored pitches are not changed. Since the stored pitches are not changed, they are still in the "definitive" representation, and are guaranteed to be displayed as originally intended if the user subsequently switches back to a B-flat instrument.

@mogenslundholm
Copy link

I did a bigger transposition example showing 17 normal pitches with one-to-one relation. This can be done knowing that that transposition is between C and Bb. However, the MusicXML approach with both chromatic and diatonic steps seems to be better. It can handle not only standard keys.

Adding an offset when playing is no big problem. I have identified four cases in MusicXML concerning pitch: Clef, Transpose, Octava and Harmonic. But I would not be shure there couldn't be another definition affecting the pitch. (Transpose is written pitch, Harmonic may be both written and sounding pitch)

What worries me is, that this is not the end. There are exceptions. How many will appear? We are considering a few instruments used, but there are many instruments in the world. For example alt-flute(recorder), which I played in F-Clef as a child. Now it appears transposed C-Clef in MuseScore. The flute even has barock grip or german grip. An oud has several tunings - sometimes shown transposed.

The only we know for sure is sounding pitch. The same into eternity defined in terms of number of cycles of radiation a cesium-133 atom and the base A4-tone. But notation changes. Some tunes may not be played right in 100 years.

The alter-command is absolutely sounding pitch. Does the alter value make sence when transposed?

With a one-to-one tranposition, there should be no problem. (both the chromatic and diatonic specified). Note the last note in the example. You could say: a Bbb - isn't it just an A? But no, since the diatonic tranposition value is one, it becomes Bbb.

Using sounding pitch in the file does not change any presentation of a transposed tune, it is like a computer using binary numbers internally, but may present the number as decimal number.

With one way to solve some problem, it will be tested all the time. With two ways it will cost more than double effort and double test. The two possible ways to do it will never be tested equally.

A scale may not be equal tempered. Having two books and several papers about makam and folkmusic. Opens the book Makam of Signell and read: "transposition will cause a sligt alteration in the size of an interval". But I think that the MusicXML-way will also work here - even for the quarter tones.

PS: Note that transposition is two things: A melody may be transposed and actually change the pitch. And a melody may be shown transposed in order to make it easier for the player to read the notes.

transposing

@joeberkovitz
Copy link
Contributor Author

@mogenslundholm I believe you're confusing two different things: transposition (which only affects the way parts are presented for reading by a performer) and sounding pitch (which determines the actual pitch heard in performance).

This proposal only deals with transposition. As such, it will suffice to describe how to transform from one realization to a different one (e.g. show a Bb instead of a C, or a D## instead of a G#). This is not about sounding pitch. Furthermore it is a single procedure based on a diatonic/chromatic interval, not two (or N) different procedures.

Actual sounding pitch, temperament, and microtones such as those in maqam music are not part of this proposal. At the moment I believe that <interpret> is sufficient to describe the actual performed pitch of any note in decimal terms. But please let's not take that question up here -- it is not about transposition, really.

@mogenslundholm
Copy link

Thanks for the clarification. I still believe that the makam-music can work "just" by doing it like MusicXML.

@joeberkovitz
Copy link
Contributor Author

Notes from chair call today:

  • Realizations can also tweak style properties (e.g. stem direction).
  • For credits, permit text to reference or embed metadata as data source.
  • For layouts, we'll eventually need a separate detailed proposal for credits, headers, footers, page-attached text, interpolated text

PR will come next.

@jsawruk
Copy link

jsawruk commented Jan 27, 2019

Regarding pitch representation:

I am just getting caught up on this issue, so apologies if this duplicates someone else's suggestion.

It is my opinion that every note should encode both the written and sounding pitch, with the written pitch being mandatory and the sounding pitch being optional.

For example:

<!-- Note that sounds as written -->
<note pitch="D5" />

<!-- Note that sounds at a different pitch than written -->
<note pitch="C5" sounding-pitch="Bb4" />

If sounding-pitch is omitted, then sounding-pitch is equal to (written) pitch.

Instruments could continue to have transposition properties so that a fallback is available when sounding-pitch is omitted. For example, an MNX document only encodes a C score and has no sounding-pitch properties, but the user wishes to view this document as a transposing score. In that case, the instrument's transposition information would inform a transposition algorithm. This display could be inaccurate in terms of enharmonics, but would work similar to how transposing instruments currently work in MusicXML.

Here is my rationale for including both written and sounding pitch:

  • It avoids ambiguity regarding transposing instruments, and allows the author to make editorial judgements regarding appropriate enharmonics. In other words, instead of considering sounding pitch as a transformation of written pitch, I would consider sounding pitch to be independent of written pitch.
  • It allows for accurate realizations without intervening transposition algorithms in some scenarios. For example, to properly display a graphic rendering, use the written pitch (but see above regarding C scores). Creating a MIDI realization of the same content would use the sounding pitch. No transposition algorithm is involved in this case.
  • Each part could also have its own independent pitch and sounding pitch, so that the part's pitch representation is independent of the score's pitch representation of the same note?

I should note that I have not yet fully considered the implications of ottava lines or transposing clefs. I would defer to however MNX decides to encode these objects, while retaining the semantics that sounding pitch is the literal sounding pitch (i.e. MIDI pitch number or other pitch representation that encodes to hertz).

I realize that using an approach like this could be overly verbose, but I would prefer verbosity to ambiguity.

@shoogle
Copy link

shoogle commented Feb 10, 2019

It is generally a bad idea for a format to allow multiple ways to say the same thing. Written pitch and sounding pitch are not the same, but they are related by the transposition interval so it shouldn't be necessary to store both where one and an offset would suffice. What happens if we store both and they become out of sync?

I think we really need to see some real-world examples of where a naive transposition would get it wrong, and an explanation of why this can't be solved simply by improving the algorithm, perhaps by taking the current or previous key signature/transposition into account.

@clnoel
Copy link

clnoel commented Feb 18, 2019

Considering that Joe, the initial proposer of this framework, is not really around to comment any more, I want to know if some one else (a chair or someone who agrees with it) is now championing the original proposal?

I'm of two minds on this. On the one hand, I strongly consider the "ground truth" of a piece of sheet music to be the sounds it is intended to produce. Music is a language and notation is its written form. That's one of the reasons I consider things like multi-measure rests and even whether you use a quarter or two tied eighths to be decisions of writing style that change how easy it is for a performer to create sound from the notation, and therefore many such things belong as part of layout decisions, since they do not alter the ground truth of the resulting sound. From that point of view, it makes sense for sounding notes to be encoded in the scores.

On the other hand, especially for historic documents, the ground truth of the intended sounds can be hard to figure out, since the "writing style" of the music has changed over time, and is sometimes peculiar to individual composers, much like differences in handwriting. Also, it is part of the mission here to be able to faithfully reproduce the original source document without a lot of hassles. From this point of view, it makes sense for written notes to be encoded in the scores.

What we've got to do here is to decide: Which viewpoint is more important?

Or, if they are of equal importance: Do we support both viewpoints evenly, even though it gives us two ways to represent things in MNX?

--Christina
(Edited to remove a typo)

@jsawruk
Copy link

jsawruk commented Feb 18, 2019

I think we should support a way that separates the concept of written pitch and sounding pitch because this is the least ambiguous. After considering the problem more thoroughly, I propose the following:

  • The written pitch is the primary pitch. This is what is physically displayed in the sheet music.
  • Each written pitch may then have an associated collection of 0 or more alternate representations.

These alternate representations may or may not be the sounding pitch. For example, one alternate representation could be the written pitch for a specific part. The use case for this is condensed scores.

Consider a condensed score written in C with a single staff for "High Woodwinds". This staff might show the conductor the sounding pitch for both the Flute and Bb Clarinet parts, but does not show the written pitch as the performers see them.

In the above scenario, one should be able to create such a condensed score and unambiguously extract the written part.

Example for non-transposing instrument or a transposing instrument in a transposed score:

<pitch name="C4" />

Example for transposing instrument in a C score:

<pitch name="C4">
  <alternate-pitch type="sounding" name="Bb4" transposition="-2" />
</pitch>

Example for condensed score:

<pitch name="C4">
  <alternate-pitch type="part" name="Bb4" transposition="-2" part-ref="#clarinet1" />
</pitch>

@shoogle
Copy link

shoogle commented Feb 18, 2019

Great points here from @clnoel. I agree that sounding pitch is the "ground truth".

@jsawruk, your examples expose the fundamental weakness of encoding written pitch, which is that it can differ between the score and the parts. Your method requires altenate-pitch to sometimes be sounding and sometimes written, whereas if you made the sounding pitch the primary pitch then the alternate pitches would only ever be written pitches. Why treat the score differently to the parts?

@mogenslundholm
Copy link

mogenslundholm commented Feb 18, 2019 via email

@shoogle
Copy link

shoogle commented Feb 18, 2019

Another argument in favour of sounding pitch is that this allows you to encode all editions in a single file. For example, you can encode:

  • Note sounding C4
    • Smith 1874 Edition for instrument A used transposition X
    • Jones 1874 Edition for instrument B used transposition Y

You can do the same for clefs and ottava lines:

  • Note sounding C5
    • Smith 1874 Edition started an 8va line here
    • Jones 1874 Edition inserted a treble clef here

Now applications can offer a dropdown list of editions and users can choose to have the score rendered according to that particular edition. This is not possible if you encode written pitch, at least not without encoding everything multiple times and creating a huge file.

@adrianholovaty
Copy link
Contributor

Considering that Joe, the initial proposer of this framework, is not really around to comment any more, I want to know if some one else (a chair or someone who agrees with it) is now championing the original proposal?

Thanks for picking up this discussion! Speaking as the co-chair who's been asked to continue the MNX work: my gut preference is also to treat the sound as the ground truth.

With that said, I think we'd all benefit from a comprehensive list of pros and cons for the two approaches — so we can make a decision based on complete information. I will be synthesizing the comments here into such a list, so if anybody has further thoughts on the matter — especially perspectives that haven't already been expressed in this thread — please contribute your thoughts and reasoning.

I don't expect every detail of MNX will require such a heavy handed pros-and-cons approach, but this is such a fundamental decision that it's important to get right.

(I'm still getting situated in the co-chair role but hope to have some more concrete things to contribute in the coming days...)

@jsawruk
Copy link

jsawruk commented Feb 18, 2019

@shoogle: One of the issues with using only sounding pitch is that the visual representation can change when displaying the written pitch.

For example, with a Bb instrument, a sounding note of Bb has a written pitch of C. Depending on key signature and/or engraver preferences, this could change the layout of the part since there could no longer be an accidental, potentially leaving an unnatural whitespace in front of the C. Engravers like to alter the parts independent of the score to optimize the display as a part. This is the part independence feature in score editors that allows you to alter properties of the part without altering properties of the score.

@adrianholovaty: I agree that creating a list of pros and cons for each approach is a good idea, and probably the only way to truly come to a consensus. I personally think there are three approaches:

  • Only written pitch is encoded, and a standardized algorithm is adopted to convert to sounding pitch
  • Only sounding pitch is encoded, and a standardized algorithm is adopted to convert to written pitch
  • Both written and sounding pitch are encoded and are completely orthogonal. No algorithm is needed to convert.

Every approach has its advantages and disadvantages, but I still think written pitch and sounding pitch are separate concepts (which is why I recommend an algorithm to convert between the two. For example, see how music21 handles accidentals during transposition). It is my opinion that written pitch is a very engraver/orchestrator way of thinking, while sounding pitch is a very composer/MIDI way of thinking. Since our group encompasses representatives from all of the above groups, I doubt we will be able to just pick one pitch representation that everyone can agree one.

Where should the pros and cons be listed?

@adrianholovaty
Copy link
Contributor

@jsawruk Still working on finding a proper home for the pros and cons (perhaps the project wiki). At the moment, let's keep stuff in this thread. 👍

@shoogle
Copy link

shoogle commented Feb 18, 2019

@shoogle: One of the issues with using only sounding pitch is that the visual representation can change when displaying the written pitch.

For example, with a Bb instrument, a sounding note of Bb has a written pitch of C. Depending on key signature and/or engraver preferences, this could change the layout of the part since there could no longer be an accidental, potentially leaving an unnatural whitespace in front of the C.

One of the lessons from MusicXML is that you shouldn't try to store the exact positions of all elements in the first place as applications will just ignore it and do their own thing regardless. If you want exact positions then you need to use MNX-Generic (i.e. SVG).

It might be possible to specify positions as offsets from a default position, but the default position would surely take accidentals into account so the particular example you give would not be a problem. (If the default position didn't take accidentals into account then the layout would be ruined as soon as somebody tries to transpose the score to a different key.)

I suppose for OMR purposes it might be useful to have an option to store exact positions of symbols that were transcribed from a scanned score. This information could be used to overlay symbols on top of the scanned image within an OMR application, and for the purposes of training an AI to recognise musical symbols on a page, but it would not be used by sheet music applications for the purposes of storing layout information as they would much rather use their existing (and competing) algorithms.

Engravers like to alter the parts independent of the score to optimize the display as a part. This is the part independence feature in score editors that allows you to alter properties of the part without altering properties of the score.

This is really a separate argument relating to the wider question of how to encode differences between the score and the parts. i.e. Is it necessary to write everything out twice or can we get away with writing it once and storing the differences between them? Sounding pitch vs. written pitch is just a small part of that topic.

@shoogle
Copy link

shoogle commented Feb 19, 2019

As I said above, I think we really need to see some real-world notation examples where a naive transposition algorithm would get the conversion between sounding pitch and written pitch wrong. Sounding pitch seems vastly superior to written pitch in most other respects, so the case for written pitch really depends on these examples, and on an inability to solve them by any other method, such as:

  • Storing some kind of hint for the transposition algorithm.
  • Improving the algorithm (e.g. take the previous key into account at a key changes).

There is of course always the option to store both pitches, but the risk of doing this is that people may start using it as a way to encode differences that are not related to transposition. For example, people might try to encode things like "the score says E flat here but the parts say E natural". We may want to provide a dedicated feature for that very purpose, but we would need to think very carefully before allowing the transposition feature to be used for anything other than transposition.

@jsawruk
Copy link

jsawruk commented Feb 20, 2019

@shoogle: "Sounding pitch seems vastly superior to written pitch". I respectfully disagree with this sentiment. Written pitch and transposing instruments exist for a variety of reasons, and are how music is notated both historically and currently. I do not think it is the responsibility of MNX to eliminate written pitch. If we do not support a direct encoding of written pitch, then MNX will not be used by publishers.

I would prefer an unambiguous representation of both sounding and written pitch, but I understand your concern that this could be abused. I view written pitch and sounding pitch as representing the same item but presented in different terms (similar to how a point can be represented by two different vectors using a change of basis). It is this line of reasoning that made me suggest a way to have multiple "representations" of a given pitch.

If we were to support only one representation, I would prefer written pitch, because it is easier to transform written pitch to sounding pitch. A sounding pitch could represent multiple written pitches (MIDI 60 = C4, but could also be B#3 or Dbb4), whereas a written pitch only represents a single sounding pitch (C4 always maps to MIDI 60). This is my concern with an algorithmic approach. Any transposition algorithm that we propose should have the property that it is an involution: T(T(x)) = x. However, since written to sounding is a 1:1 function, but sounding to written is a 1:n function, I do not think in general these two functions can be composed to create an involution. Now, in the 90%+ of cases, I doubt there will be any issue, but I do worry that there will be ambiguity in some edge cases.

As far as an example, I have created an example (please see attached) of a transposition algorithm failure using Sibelius, though I think this failure would occur in other software as well. I have written a melodic line of C, Cb, Cbb for both Flute (non-transposing instrument) and Clarinet in A (sounds down a minor 3rd). To convert the sounding pitch (the Flute line) to the written pitch for the Clarinet in A, the transposition algorithm applies the following transposition: transpose up a minor 3rd.

C +m3 -> Eb
Cb +m3 -> Ebb
Cbb +m3 -> ???

Since there is no notation for "E triple flat", the algorithm breaks in an unusual way. A copyist would change this note to a Db, such that:

Cbb +m3 -> Db

transposition-example

Note that this changes the pitch class. Writing this in code would require an exception to handle such a case. While this is a simple melodic example, I am also concerned with more complex use cases involving harmony, as well as music that is atonal/open key.

@clnoel
Copy link

clnoel commented Feb 20, 2019

@jsawruk
First off, written-to-sounding is not one-to-one. Transposing staffs, like the clarinet in A that you show there, mean that a note that looks like an Eb should "sound" as a C when played by a Clarinet, instead of as the Eb a Flute would play. This is definitely one-to-many! And, in fact, is many-to-many, because C## and D both produce the same MIDI note. (Which is, I am now realizing, one of the cons of using sound as primary because you then have to specify which enharmonic you want!)

I consider your other point to not be an issue. My naive transposition algorithm doesn't produce the same probem as yours. I'll use your example which is C, C#, C##; C, Cb, Cbb. This gets changed to MIDI 60, 61, 62; 60, 59, 58. On transpose (of +3 semitones) my engraving program needs to make these look like MIDI 63 64 65; 63, 62, 61. So, naively, I end with Eb, E, F; Eb, D, Db as the default transposed notes. This still does not hit your manual transposition, (which uses E# instead of F and Ebb instead of D), but it is very clean and does not have an issue with representing the desired pitches correctly.

On the other hand if there was a Clarinet in A line in the original that I was trying to encode, and it looked like your manual transposition, I would need to specify the fact that an "unusual" enharmonic was being used for the E and the D. I'm not sure what that would look like off hand.

--Christina

@jsawruk
Copy link

jsawruk commented Feb 20, 2019

@clnoel: I respectfully disagree with your interpretation of what I meant by written to sounding being one-to-one, and I should have provided a clearer argument. My point is that, for a non-transposing instrument, any written pitch maps to one and only one MIDI pitch. However, the opposite is not true: MIDI 60 could be interpreted as B#3, C4, or Dbb4. Because there is already ambiguity here for a non-transposing instrument, I feel that extending to transposing instruments only introduces more complexity and ambiguity. I think we are on the same page now, so sorry for the confusion!

I personally do not feel that a transposition algorithm should convert pitches into MIDI numbers and then back into a pitch. Doing so would require a pitch spelling algorithm (e.g. ps13). This could result in an incorrect pitch spelling. I am only worried about the possibility of an incorrect pitch spelling.

Also to your point, what are the "desired pitches"? Are they what the composer wants, what the performer wants, what the publisher wants, what the musicologist wants, or what the synthesizer/sampler wants? This is what I believe is the root of the pitch representation issue, because these could all (potentially) be different answers. We may choose one representation that does not give the optimal solution for all involved parties, and I think discussions like this thread are very helpful to find the best solution.

As for my position in this discussion, here is my full disclosure:

  • It is the opinion of my employer that written pitch must be encoded in any sheet music standard such as MNX, as this represents the publisher's intent. Since they interact with a very large number of publishers, they feel that this is an important issue.
  • As a software developer who has written several MusicXML parsers, I have been dealing with the pitch representation issues in MusicXML for well over a decade. It is my personal opinion (not my employer's) that there should be no ambiguity in pitch, but my experience as a developer and composer has shown that there can be ambiguity when it comes to transposition.

I am not simply arguing my point to be a contrarian, but to encourage discussion about this issue. It is an issue that myself and others feel is extremely important, so I hope we can find an acceptable solution.

@shoogle
Copy link

shoogle commented Feb 20, 2019

@clnoel, I think @jsawruk meant written to sounding is many-to-one, because both C# and Db produce the same sound. However, that is only true in equal temperament, so it is important that we get the correct enharmonic in case an application wants to enable other tuning systems. Let's avoid talking in MIDI pitches for this reason.

@jsawruk, I'm afraid the particular example you showed does not help your case.

Since there is no notation for "E triple flat", the algorithm breaks in an unusual way.

No, this is entirely predictable. There is indeed no such thing as "E triple flat", so now the algorithm needs to look for something equivalent, and the available choices are C# and Db. C# changes the scale by 2 degrees (E to C), whereas Db is only a change of one degree (E to D), so Db is the only correct choice. MuseScore gets this right, and I'm sure the vast majority of notation program would also get this right.

So in this case there is no disadvantage to using sounding pitch, but what about written pitch?

Let's explore what would happen if we stored your example as written pitches. For the Flute we store C, Cb and Cbb as before, but instead of calculating pitches for the Clarinet we just directly store Eb, Ebb and Db. Now imagine that we want to "undo" the Clarinet transposition and display the score with all instruments in C:

Eb - m3 -> C
Ebb - m3 -> Cb
Db - m3 -> Bb

So now the Clarinet has a Bb where the Flute had a Cbb.

This is the problem with storing written pitch: it obscures the harmonic relationship between the different parts. If we use written pitch then the harmonic relationship is preserved.

@mogenslundholm
Copy link

mogenslundholm commented Apr 8, 2019 via email

@jsawruk
Copy link

jsawruk commented Apr 8, 2019

@shoogle:

but you have not proven that the axiom holds true for all possible algorithms

First of all, that's not what axiom means. Axiom means assumption. I don't need to prove anything about an axiom, rather I can prove things given a set of axioms. If you disagree with the axioms that I am proposing, then that's fine. I only am using them to show my thought process and how I reach my conclusions.

As far as proving something for the set of all transposition algorithms, I don't think that's possible. I am assuming that there is a set transposition algorithms that are monoids (associative, identity, non-invertible), and a set of transposition algorithms that are groups (associative, identity, invertible). For example, transposition using MIDI pitch numbers and a numeric transposition amount forms a group: T(60, 3) = 63, and T(63, -3) = 60: -3 is the inverse of the 3 transposition in this case. However, when dealing with pitch strings and interval strings, this might not always work: T(Cbb, +m3) = Db, but T(Db, -m3) = Bb. In this case, -m3 is not the inverse of +m3. Since these are different algebraic structures, I don't know how to prove a result for all cases. I could maybe produce two separate proofs, one for each type of algorithm, or perhaps I could prove a result using category theory. However, I don't think doing so would be helpful to anyone, so I won't pursue such proofs at this time.

It depends whether by "primary representation" you mean the exclusive representation (only one kind of pitch is ever used, and it is always the same) or you mean that both are specified and the primary one (whichever it may be) takes precedence. If both are specified then there is no need to specify an algorithm.

By primary representation, I meant written pitch OR sounding pitch OR written and sounding pitch together. Specifying both would mean we don't need to specify an algorithm, and that is the primary thesis of my argument, so I am glad you are understanding my position.

Also, I am not suggesting replacing pitches with intervals.

That's a pity. I think it is worth consideration, though probably not in this thread.

Intervals and pitches are different concepts. We should probably discuss intervals in a different thread, but I cannot support a position of eliminating pitch.

so while your new approach may be easier to read, it doesn't actually add any additional information

It does convey more information because "+6" is ambiguous. C up 6 semitones = ? It could be F#, or it could be Gb. C up an augmented fourth, however, is always F#, and C up a diminished fifth is always Gb.

@notator: I think you make some good points, and I agree with virtually all of them, but I think microtonal transpositions should be included. I can't think of any use cases off hand, but specifying transpositions using a decimal is something MusicXML already supports, so I think it makes sense to also support it in MNX.

@notator
Copy link
Contributor

notator commented Apr 8, 2019

@jsawruk: Yes, you're right. Microtonal transposition values should be allowed. A use case would be the global adjustment of playback to a base frequency that is not A=440Hz.

@clnoel
Copy link

clnoel commented Apr 8, 2019

The one thing I want to make sure of is that there aren't two mutually-exclusive ways to do this. I don't want a thing where we can say both concert-pitch and written-pitch are optional, but you have to have one. (Or pitch and midiPitch, or whatever we end up calling the two attributes!) One has to be required, and the other can be optional to provide additional info. I'm pretty sure that this means that at least one use case is going to be made harder to deal with, but I find that to be an acceptable downside. The fact that there are so many optional ways to do things in MusicXML is one of the reasons I really hate dealing with it.

@notator I support the idea of a transposition as a direction. That makes a LOT of sense. This direction can cover a lot of cases, such as a clef change, or an 8va notation or several other visible marks on the page, and I think that the relationship between a transposition direction and those other visible marks on the page should be discussed in a separate thread.

@shoogle
Copy link

shoogle commented Apr 8, 2019

@notator, please do not discuss MusicSVG in this issue. I can see it may be useful in some situations, but not in others and the group has made it clear that they don't want to go down this route. I suggest you propose it to an open source project such as MuseScore, Audiveris, or Lilypond and see if they are willing to take it forward, or to accept code contributions in that area. MuseScore already has a somewhat customized SVG exporter, so I think the idea of adding semantic attributes as an optional feature will not prove overly controversial. Please refrain from bringing it up here again except in a dedicated issue.

Back to the issue at hand, if you like MIDI then you could store pitch as three separate quantities:

  • a MIDI pitch number (which is spelling agnostic)
  • a concert spelling (tonal pitch class)
  • a transposed spelling (tonal pitch class)

This is how MuseScore stores pitch (the transposition interval is stored separately as a property of the staff). This method gives equal prominence to written and sounding pitch, thereby avoiding any controversy. However, MuseScore currently does not support microtonal scales and I'm not sure how easily this method could be extended to support them. (Presumably it could be done by adding new pitch classes, though it may assume equivalence of certain classes, like C# and Db, that may not be true outside of 12-TET.)

@jsawruk

It does convey more information because "+6" is ambiguous. C up 6 semitones = ? It could be F#, or it could be Gb.

That is not how transposition works in MusicXML. Transposition in MusicXML is not just given as a number of chromatic steps (semitones), it is also given as a number of diatonic steps. These two numbers together allow you to calculate the interval and recover both pitch and spelling unambiguously. If you want to specify the interval explicitly then it might make things clearer (perhaps somebody would like to say why that was not done originally in MusicXML) but it would not add any new information.

I think microtonal transpositions should be included. I can't think of any use cases off hand, but specifying transpositions using a decimal is something MusicXML already supports, so I think it makes sense to also support it in MNX.

I support microtonal tuning. I have never heard of microtonal transposition (except in the MusicXML spec). I think we should keep tuning and transposition separate, unless somebody can provide a real life example of when microtonal adjustments should definitely apply to transposition rather than tuning.

@notator

Microtonal transposition values should be allowed. A use case would be the global adjustment of playback to a base frequency that is not A=440Hz.

Your example refers to tuning, not transposition.

@notator
Copy link
Contributor

notator commented Apr 9, 2019

@shoogle:

Your example refers to tuning, not transposition.

Yes, I was a bit hasty in replying to @jsawruk.
I have no objection to calling "the global adjustment of playback to a base frequency that is not A=440Hz" tuning, but I can't find this setting in the current draft spec, and am not sure where it should go in the MNX-Common file.
It could simply be an attribute of <mnx-common> that redefines the frequency (Hz) of a written A4 (default 440).

<mnx-common A4="431">
...
</mnx-common>

I can't think of any other use cases for microtonal transposition.
Even if such use cases exist, they must be rare, and there is an alternative way to achieve the same result: simply define both pitch and (mictrotonal) midiPitch (or whatever we call these things) on every note in the part. I agree with @clnoel that its a good idea to avoid having two ways to do the same thing (unless there's an exceptionally good reason), so the bottom line is that I now think my original instinct was right: the transposition direction should be limited to whole numbers of semitones.

@clnoel I think any music notation standard has to be able to describe both a symbol's appearance and what it means, so an apparent redundancy is inevitable. But that's not to say that we necessarily have to allow midiPitch to be defined without defining pitch. My current feeling is that all the alternatives I described above should be allowed, but I fully agree that this should be discussed thoroughly in a separate issue. The attribute names pitch and midiPitch also need discussing...

And here's another issue (Maybe its time this issue was split up?):
We need to discuss the representation of arbitrary microtone symbols in MNX-Common.
§5.2.2.4 of the draft spec defines four ways to name symbols for quarter-tones. I think these definitions

  • violate @clnoel's principle of non-duplication.
  • confuse space with time (they imply precise tunings where none exist)
  • are unnecessarily restricting

Applications should be allowed to use symbols that are as differentiated as they like. Maybe some applications will want, in some situations, to use enharmonic spellings for the same non-ET tuning.
In other words, I think applications should be enabled to be as "precise" as they like when creating symbols for microtonal tunings. Some (many?) applications will just support the standard CWMN accidentals (bb, b, n, #, ##). Others will implement basic quarter-tone symbols. Yet others may want to use specialised symbols for other tunings.

A possible solution would be to have a "wildcard" addition to the symbols defined in §5.2.2.4. This would be of the form diatonicnotehead-height, "a" =any (or some other character) for the wildcard, and a number for the octave. Examples: Aa4, Da2 etc.
The code

<note pitch="Ca4" midiPitch="60.4"/>

would tell the client application to draw the notehead at C4 together with an accidental (or no accidental) that is the best match for the given frequency.
If the app does not support microtone accidentals, this would result in an ordinary C4 symbol because 60.4 is closer to 60 than to 61).
If the app supports quarter-tone notation, it can use the nearest quarter-tone accidental (whose valid range might be from 60.33 to 60.66).
If the app supports other accidental types, it can use those.
Another example might help:

<note pitch="Da4" midiPitch="60.4"/>

An app that does not support microtone accidentals would interpret this as a double-flat (because 60.4 is less than 61 (=default Db frequency).
An app that does support microtone accidentals might still interpret this as a double-flat, if it had no microtone accidental corresponding to a D4 at that frequency.

As I said, this needs discussing fully in a separate issue.

@jsawruk
Copy link

jsawruk commented Apr 9, 2019

@notator: I agree that there appear to be multiple issues in this thread, though I'm not sure how best to split them. As far as global tuning (like A4=431), I think that should be separate from this issue if it isn't already.

@mogenslundholm
Copy link

@notator : About "An app ... would interpret this as a double-flat". Consider also totally independant sounding and written pitch: The app does not interpret at all.

@mogenslundholm
Copy link

Not fond of the Midi pitch numbers. Charles Delusse (1720-1774) wrote "Air a la Grecque" with some quarter notes . The MIDI writers should know that classical European music have 24 notes per octave, though 12 of them are seldom used.
Also I think that MNX should be "demiditized". I wonder if the new MIDI standard will remove the limitations on Pitch, Channel and Instrument. (The are more instruments, e.g. I prefer an oud rather than a gunshot).

@notator
Copy link
Contributor

notator commented Apr 12, 2019

@mogenslundholm

About "An app ... would interpret this as a double-flat". Consider also totally independant sounding and written pitch: The app does not interpret at all.

I'm not sure what you mean there. My proposal does indeed treat written pitch as being completely independent of sounding pitch. But the written and sounding pitches can always be inferred from defaults where the information is otherwise missing. The app always has enough information to do some kind of interpretation.

I think that MNX should be "demiditized".

Agreed. Using "MIDI.cent" syntax does not mean that the interpreter has to use MIDI to implement the sounding output. The syntax just provides a convenient way to describe a frequency. Maybe the name needs changing. "MIDI.cent" notation would work regardless of whether or not MIDI 2.0 provides a simpler way to describe more accurate tunings (maybe by supporting Hertz or cent-accurate tunings directly).
§5.2.1.4 of the current draft spec provides a link to Scientific Pitch Notation. At the bottom of that article, there is a table which provides a direct correspondence between "MIDI note numbers" and equal temperament frequencies. So its possible to describe cent-accurate frequencies using MIDI.cent (or SPN.cent) notation. That provides sufficient accuracy, and is much more expressive/convenient, in a music notation context, than using Hertz.

The score of Air a la Grecque seems only to be available on-line through various libraries, but there's a performance on YouTube.
It would be up to the interpreting application to decide how to notate it, but the piece's notes (graphics and sound) could very well be described by the syntax I'm proposing. The original notation, in particular, could only be reconstructed by an app that knew how the original notation looked. Did Delusse use special accidentals, provide fingerings, or just write some explanatory text above the notes?

I wonder if the new MIDI standard will remove the limitations on Pitch, Channel and Instrument. (The are more instruments, e.g. I prefer an oud rather than a gunshot).

In spite of "demiditizing" MNX, I think there should be a way to interface with MIDI if that is specifically required.
MIDI banks solve the channels problem, but if you want a specialised instrument, such as an oud, I think you are always going to have to provide a soundfont (or something similar) containing the patch defining it. (There are various initiatives around, trying to simplify that...)
MNX (not just MNX-Common) needs a way to link to a particular MIDI patch. In MNX-Common at least, that ought to be done in a <direction>. Maybe the patch would be in the MNX container, or in the cloud somewhere...

<midi-patches>
  <midi-patch name="oud" url="some url" />
</midi-patches>
...
<measure index="1">
  <sequence>
    <directions>
      <midi-patch="oud" location="1/2" />
    </directions>
    <event value="/2">
      <note midiPitch="60"/>
    </event>
    <event value="/2">
      <note midiPitch="61"/>
    </event>
  </sequence>
</measure>

The default patch would, as usual be a grand piano.

@clnoel
Copy link

clnoel commented Apr 12, 2019

I'm going to tweak my previous proposal on the sounding/written pitch debate. I'd appreciate some comments. The underlying idea here is to make it easy to replicate the written document. As much as it violates my natural inclinations that the sound is the thing we need to treat as the ground truth, it is much harder to reproduce the original written document from the sound than it is to produce the sound from the written document. It's the same problem that any transcriber ends up with when trying to turn an audio track into a score, and a good portion of the point of this format is to remove ambiguities.

So, if you have a concert-pitch part, it's easy to represent a Middle-c note. I've written it in here with two different spellings, to make it clear.

<global>
   <measure>
       <key fifths='0'/>
   </measure>
</global>
<part>
   <measure>
      <directions>
         <clef sign='G'/>
      </directions>
      <sequence>
         <event value='/2'>
            <note pitch='C4'/>
         </event>
         <event value='/2>
            <note pitch='Dbb4'/>
         </event>
      </sequence>
   </measure
</part>

If the 'Dbb4' spelling is meant to represent a microtone, instead of the same pitch as a 'C4', which some comments above indicate might be the case, the following can be used:

<global>
   <measure>
       <key fifths='0'/>
   </measure>
</global>
<part>
   <measure>
      <directions>
         <clef sign='G'/>
      </directions>
      <sequence>
         <event value='/2'>
            <note pitch='C4'/>
         </event>
         <event value='/2>
            <note pitch='Dbb4' sounding='C4+.25'/>
         </event>
      </sequence>
   </measure
</part>

In this case, the pitch attribute can have arbitrary numbers of sharps and flats plus a microtone adjustment, but the sounding attribute should limit itself to 1 sharp or 1 flat plus a microtone adjustment. The sounding attribute should also ignore the active key signature, and always directly specify the sounding value (so would specify "F#" even if the key was "G" and the pitch-spelling "F"). Also, I'm open to making the sounding pitch be a 'Midi.cent' or similar value instead.

If you have a transposed-pitch part (in this case, Bb), you can have the following:

<global>
   <measure>
      <directions>
         <key fifths='0'/>
      </directions>
   </measure>
</global>
<part>
   <measure>
      <directions>
         <clef sign='G'/>
         <transposition semitone='-2'/>
         <key fifths='+2'/>
      </directions>
      <measure>
      <sequence>
         <event value='/2'>
            <note pitch='D4'/>
         </event>
         <event value='/2>
            <note pitch='Ebb4' sounding='C4+.25'/>
         </event>
      </sequence>
   </measure
</part>

In this case, the open-ended transposition direction occurs between the clef and the visible key. Just as for the concert-pitch case, the creator of the file does not need to add a sounding attribute unless there is something specific he wishes to specify. If the consumer wishes to calculate the sounding pitch, he applies the alterations specified by the pitch, and then applies the alterations specified by the key, and then applies an additional alteration specified by the currently active transposition directive. As before, if you wish to specify a sounding attribute to a note you ignore any active transposition and key directives and directly specify the sounding pitch.

If you have both a concert-pitch version and a transposed-pitch version that are part of the same document, you have to specify both spellings. You can decide when creating the MNX-Common document to make these entirely separate, or you can allow for two different realizations of the same part in the same document by specifying by specifying both.

<global>
   <realization name="Concert score">
      ... 
      specify parts are in this realization/layout, including using the first pitch spelling
      ...  
   </realization>
   <realization name="Clarinet part">
      ...
     specify that this is only the clarinet part, and that it uses the second pitch spelling
      ...  
   </realization>
   <measure>
      <directions>
         <key fifths='0'/>
      </directions>
   </measure>
</global>
<part>
   <measure>
      <directions>
         <clef sign='G;G'/>
         <transposition semitone='0;-2'/>
         <key fifths='0;+2'/>
      </directions>
      <measure>
      <sequence>
         <event value='/2'>
            <note pitch='C4;D4'/>
         </event>
         <event value='/2>
            <note pitch='Dbb4;Ebb4' sounding='C4+.25'/>
         </event>
      </sequence>
   </measure
</part>

Note that there is still only one sounding pitch for both spellings. Also, the "concert pitch" spelling does not have to be the first one in the list. It all depends on how your realizations want to use them.

As an aside, I have no objection, and in fact would like, this "alternative spelling" system to be used for other purposes, like specifying the TAB notation right along with the conventional notation...

Note: I presented something like this system way up above, but I think I have addressed several issues and added several refinements to it since then. Does anyone feel like this doesn't address an issue that they have?

Edit: A Bb transposition is -2 semitones, not -1. I've fixed it.

@jsawruk
Copy link

jsawruk commented Apr 12, 2019

@clnoel:

As much as it violates my natural inclinations that the sound is the thing we need to treat as the ground truth, it is much harder to reproduce the original written document from the sound than it is to produce the sound from the written document.

Thank you for saying that clearly. It's the cornerstone of my position, but I have not been able to say it in such simple terms!

and in fact would like, this "alternative spelling" system to be used for other purposes, like specifying the TAB notation right along with the conventional notation

Couldn't agree more!

@mogenslundholm
Copy link

mogenslundholm commented Apr 13, 2019

I converted files from mu2-format to MusicXML.
Sound
I added small arrows to show the pitch, but was recommended to remove them: The players won't have them. I was told that the player knows the style and knows what pitches are used in this style.
Easy to produce the sounding pitches? Couldn't disagree more!

@mogenslundholm
Copy link

@notator: I just mean that with both sounding and written pitch a program does not need to "correct" one from the other.

@notator
Copy link
Contributor

notator commented Apr 13, 2019

@clnoel and @jsawruk

it is much harder to reproduce the original written document from the sound than it is to produce the sound from the written document.

and

a good portion of the point of this format is to remove ambiguities

The word "ambiguities" is a bit weak there! I'd say it was actually impossible to reproduce an original, written document from the sound alone. Transcribing sounds requires knowledge of a whole notation tradition, instrumental conventions etc. Things like clefs, accidental spellings and fingerings don't appear at all in the sound. All that information is in the transcriber's mind.
On the other hand, it should be possible for a CWMN app to provide a first transcription attempt, that could be tweaked by its user and then saved, possibly together with the original, transcribed sounds. So yes, I agree with you both: The only way to avoid "ambiguities" in a graphic is to save it! :-)

@clnoel

the pitch attribute can have arbitrary numbers of sharps and flats plus a microtone adjustment,

  1. Do you really mean we should be allowed to have pitch="D####4"? You don't actually give an example, so probably not! :-) I'm not sure if I've ever seen a triple-flat or triple-sharp, but I'm sure MNX-Common doesn't need more than three flats or sharps on the same notehead. What is MusicXML's opinion on that? Maybe unlimited numbers of flats or sharps should be allowed in some advanced MNX-Common profile... :-)
  2. You don't give an example of a pitch with a microtone adjustment. Have you got any suggestions for doing that? Maybe we do need to define a syntax for quarter-tone symbols (in addition to the wildcard I described above for the "best fit"). One could, for example, prescribe a quarter-tone flat using qb and a quarter-tone sharp using q# (as in pitch="Dqb3", pitch="Cq#6" etc.). Apps that don't support quarter-tone symbols would simply choose some other symbol, for example the one for the semitone above or below, leaving it to their users to tweak the result in any way the app allows.

Aside: Stockhausen used both accidentals that meant precise (ET) quarter-tones, and accidentals that meant "slightly sharp" or "slightly flat". These accidentals are all in the SmuFl standard.

§5.2.2.4 of the current draft spec says that only U+0023 HASH (=#) and U+0061 b can be used as alterations in the pitch value.
I think forced naturals also need to be defined, and that the 'n' character (U+006E n) should be used for that (e.g. pitch="Dn5", pitch="Gn2" etc.).
BTW: according to Wikipedia, 'b' is U+0062. That seems to be an error in our draft spec.

Sounding pitch:
(@clnoel again)

the sounding attribute should limit itself to 1 sharp or 1 flat plus a microtone adjustment

and

<note pitch='Dbb4' sounding='C4+.25'/>

The name of the sounding attribute is up for discussion. I called it midiPitch above, but that may be a bit confusing since it doesn't necessarily have anything to do with MIDI (see my answer to @mogenslundholm above).
Other possible names for the sounding attribute might be sound, frequency, freq etc.
Its extremely important to distinguish between written and sounding pitch here, so I'd prefer not to use pitch names (C4 etc.) in the frequency description.

  • If we used a symbol name in the sounding attribute, there would be two ways to describe some frequencies. There would be no difference between using a Db or C#. That violates the principle of non-duplication.
  • If we keep symbol names out of the frequency definitions, then its clearer that the pitch attribute (which does use symbol names and accidentals) refers to the written object. So its obvious that the pitch attribute is referring to a graphic.
  • All notations can use abstract frequency values, but not all notations use CWMN note names. Some notations only write a fingering, the name of a string to be plucked, or a direction in which a melody moves. I'd prefer to have a unified way of describing frequency in all notations, so I'd prefer to avoid using CWMN note names in frequency definitions.
  • An MNX-Common file will mostly be written and read by software. The sounding attribute is only used by software that is generating (code for) an audible output. In many cases that will involve MIDI, so it would be convenient to have a simple way to convert the sounding attribute's value to one or more MIDI instructions. Using a "MIDI note number" as part of that information is just simpler than going via a CWMN note name.

I think we agree that, however it is defined, the sounding value (if it exists) should always override any transposition, key signature etc.

...

the open-ended transposition direction occurs between the clef and the visible key.

To be really picky, I think the <clef>, <transposition> and <key> directions could be written in any order inside the <directions> element.

If the consumer wishes to calculate the sounding pitch, he applies the alterations specified by the pitch, and then applies the alterations specified by the key, and then applies an additional alteration specified by the currently active transposition directive.

Nearly. An accidental in the pitch attribute should actually override the default for that diatonic pitch stipulated by the <key> directive. Here's some pseudo code (comments and corrections welcome!):

Let there be a table (Table A) in which the default (ET) frequencies of the unaltered seven diatonic
pitches are going to be stored.
Table A takes no account of <key> or <transposition> directions.

Let there be a table (Table B) that will contain the running state of the default (ET) frequencies
of the seven diatonic pitches (notated on a staff that may have a <key>).
Table B will take both <key> and <transposition> directions into account.

if global tuning info exists
{
  Use the global tuning info (e.g. A4="431") to populate Table A.
}
else
{
  Use A4="440" to populate Table A.
}

For each <note> 
{
  Use Table A and the <key> and <transposition> states to update Table B.
  If the <note> has a "sounding" attribute
  {
    the <note>'s frequency is given by the "sounding" attribute. The "pitch" attribute is ignored.
  }
  else // the <note> must have a "pitch" attribute if it has no "sounding" attribute
  {
    Find the diatonic pitch name, accidental and octave in the <note>'s "pitch" attribute.
    If the "pitch" attribute contains an accidental
    {
      the <note>'s (ET) frequency is found using Table A, the diatonic pitch name, the
      accidental and the octave value. 
    }
    else
    {
      the <note>'s (ET) frequency is found using Table B, the diatonic pitch name and the
      octave value.
    }
  }
}

If the notation in @mogenslundholm's posting above is to be classed as MNX-Common, then the above algorithm has to allow for arbitrary (frequency) modes that use the seven diatonic pitch levels on a CWMN staff.
Maybe there should be a <mode> direction that allowed the base frequencies of the seven diatonic pitch names to be defined? For example, E4 and B4 could be "detuned" to be slightly above ET Eb and Bb as follows (I don't know what the precise values should be here):

<mode C4="60" D4="62" E4="63.3" F4="65" G4="67" A4="69" B4="70.3" />

Something similar could also be done as an extension of the <key> direction.
Any other ideas on how to create special modes?

...

I have no fundamental objection to superimposing score and part definitions as in the final example in @clnoel's last posting. Its probably a good idea, that helps file maintenance when editing. Better not to have to edit similar things in different places...

@clnoel: Could you provide a simple example of how you imagine TAB being defined in MNX-Common? I'm new to MusicXML, and there's nothing about TAB in the current draft spec. I'm especially interested to see if there are implications for other non-CWMN-staff notations. Thanks.

@clnoel
Copy link

clnoel commented Apr 15, 2019

@notator
In no particular order:

  1. About TAB, Guitar Tab notation #63 is the right issue to discuss TAB representation, and I'll make a proposal there when I can get into it a little more, although I do want it to be a string-attribute, not a set of elements. I'm not a TAB expert, so I don't know all the edge cases. But I do know that notation+TAB pieces are an important percentage of Musicnotes' imports and exports, so I need to be able to deal with it in MNX. I'd like it to be easy!

  2. About the sounding attribute. I'm open to changing the name, and using a number-value instead of a pitch-string. I think we should stay away from the term "MIDI" though, since that seems to be a hot-button for others. We'll have to think about that, but if we can come to a general agreement about written-and-sounding, we can work out those details as another issue.

  3. About the values in the pitch attribute, I was following the logical conclusion from the parsing instructions in §5.2.2.4 of the current draft spec. It says, in effect, that while the next character is # (or b), keep increasing (or decreasing) the alteration. Which does mean that D#####4 is allowed! If we want to change that, it should be a separate issue.

  4. About adding an 'n' to represent a natural to the pitch-syntax. I just double-checked the current spec for pitch, ant it seems to say that you specify (e.g.) pitch='F#' regardless of what key you are in (C-major or G-major, for example), the difference being that in C-major you would add the accidental attribute to represent the visual display of the accidental, and in G-major you wouldn't, unless there was a preceding F-natural. I am changing that in my proposal for pitch and sounding, as it is not, I think, how most people in this thread seem to be treating pitch-spellings. I do still think we need the accidental attribute for some cases to specify suggested SmuFL characters for the pitch-spellings, or to specify an accidental with parentheses.

  5. You state in your pseudo-code:

// the <note> must have a "pitch" attribute if it has no "sounding" attribute

No! This is not an either-or situation (where you can have sounding or pitch or both). The pitch attribute is the required one. It specifies the pitch-spelling from the original document. The sounding attribute is optional!

  1. You are correct that the accidentals on a note override the key signature accidentals. I messed that up in my proposal. The worded description on how to get the sounding pitch becomes:

If the consumer wishes to calculate the sounding pitch, he first checks to see if there is a sounding attribute, and uses it if there is. If not, he gets the diatonic step and octave from the pitch attribute, applies the alterations specified by pitch (or, if there are none, the alterations specified by the key signature), and then applies an additional alteration specified by the currently active transposition directive, and then applies any microtonal adjustment specified by pitch.

Importantly, I just realized this doesn't cover "retained accidentals" (notes that are altered by the accidental on a preceding note). Do we just count those into the sounding pitch algorithm, or do we specify them in some way?

@mogenslundholm
Well, I am unfamiliar with that style, and would probably appreciate having the arrows! But the question isn't "Is it easy to reproduce the sound from the written pitch?" but rather "Is it easier to produce the sound from the written pitch?"

Given that sheet music is designed to be a set of instructions for producing sound from writing, I still have to feel that decoding the sound from the written pitch is easier.

--Christina

@mogenslundholm
Copy link

@clnoel: You wrote "I'd appreciate some comments". It really looks good. But I still think that non-mandatory sounding-pitch = not there.
Playing when having sounding pitch is just playing it. Not having sounding pitch is asking: "Is the sounding pitch there? No, then: Is written pitch there? Is it transposed? it it a ottava-8 up? ottava-8 down? ottava-15-up? down ottava-22 up? down? Is it a harmonic? Which type? natural? artificial? base-pitch, touching-pitch or sounding-pitch? (MusicXML possibilities)".
PS: <note pitch="C4,C4">... is short. And so is "C4,C4;D4". Someone who can't figure out what this should mean? (answer: Sounding,Written;Written ....)

@notator: You wrote: Do you really mean we should be allowed to have pitch="D####4". Well - I do. It is just easier to have no limit. Will this occur? No. (But I can make a silly example, if you want).
Do you have the mentioned work of Stockhausen as MusicXML?

I have started making a transposing algorithm. Current version transposes to flat-keys and back. With a little luck it will work also when I add sharp-keys. I will prove it by transposing all notes of a key to any other key and from that to any other key, and show the result is the same as transposing in one step.

@adrianholovaty
Copy link
Contributor

After a lengthy discussion at Musikmesse 2018 in Frankfurt, we've decided on storing written pitch. (Meeting minutes and video here, with discussion starting around 1:05:20 in the video.)

I've added a pull request to issue #4, which is specifically about the sounding-vs-written pitch issue.

As for this particular issue (Realizations and Layouts), the discussion here has become quite wide-ranging — to the point where it's difficult to digest all the ideas and they've veered off track in places. :-) With that in mind, I'm going to copy the relevant ideas into issues #34 (differences between scores/parts) and #57 (page/system flow), for more atomic discussions. After that's done, I'll close this issue (but of course it'll be archived and all comments will still be available).

@shoogle
Copy link

shoogle commented Apr 20, 2019

@adrianholovaty, thanks for the update! I'm glad a decision has been made. Written pitch is better for transcribing existing works, and will certainly make the transition from MusicXML much easier.

Now, I would have like to use sounding pitch for new compositions, and I think that would have made more sense for a native format. However, the "workaround" for those of us that prefer sounding pitch is simply to write your scores in concert pitch, because, as @mdgood says in the video:

in concert pitch, written pitch is sounding pitch.

So if you truely don't care about transposition, or don't feel qualified to specify a spelling, you can simply write your scores in concert pitch and leave the transposition to an editor / end users / the application.

@notator
Copy link
Contributor

notator commented Apr 24, 2019

@adrianholovaty: Great that a decision has been made! (Sorry about the delay, but I've been away...)
This thread continues in #4 but to tie things up here, and for the record, I'd like to reply properly to @clnoel's and @mogenslundholm's last comments.

@clnoel:
Thanks for the link to #63. I'll take an independent look. Interesting that @snakebyte69 is calling for opinions/participation from the main software vendors! As I said somewhere, they also have a special role to play in deciding what is and is not CWMN... :-)

The sounding attribute: Yes, I'm also open to discussing different names, but sounding is fine by me. You're probably right that we should stay away from using "midi" in the name (But I still think that "MIDI.cent" notation would be a convenient way to notate frequencies. Much more convenient, for example, than using Hertz.) Note that the current spec quotes this Wikipedia article on "Scientific Pitch Notation" to justify the use of octave numbers in pitch attributes. The same article uses MIDI note numbers lower down.

D#####4 and adding n to the pitch syntax: Yes. These and the representation of accidentals in general, need discussing properly. I think the best place for that is currently #4. Basically, I think the current spec is confusing and out of date following our adoption of the transposition direction and sounding attributes.

Apropos the pitch and sounding attributes, you said:

This is not an either-or situation (where you can have sounding or pitch or both). The pitch attribute is the required one. It specifies the pitch-spelling from the original document. The sounding attribute is optional!".

Okay. I agree with you. The pitch attribute, which specifies the graphical pitch-spelling, should be compulsory. The point of my pseudo code was to get at issues like that, so as to clear them up. :-) Reading that code, I also thought about "repeating accidentals". I think these can simply be included in the cascading defaults hierarchy that is read while parsing the file. We'll probably solve that issue in #4.

@mogenslundholm
The Stockhausen accidentals can be found in the SMuFL docs here.
As I said, I think we should continue the discussion about how MNX-Common should treat accidentals in #4.

@adrianholovaty
Copy link
Contributor

We've decided to close this issue, as it's gotten quite large in scope. Please use the following sub-issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests