Endorsements #3

stevengj · 2016-08-27T02:47:53Z

It would be good to get endorsements of the final proposal by prominent individuals, organizations, free/open-source projects, and corporations (or at least corporate representatives), to help ensure that this is taken seriously by the Unicode Consortium.

They could be listed as co-authors, or we could have a separate section for "endorsers" of the proposal.

(Note that we don't want to turn this into an online petition; I think it will have greater impact if we limit ourselves to widely recognizable entities or prominent representatives thereof.)

cc @StefanKarpinski, @Carreau, @fperez

fperez · 2016-08-27T04:53:15Z

Context: this is originally motivated by Julia and the discussion started in the IPython repo, but now @stevengj has made a proper repo here for further work.

Pinging @pkra from MathJax: Peter, what do you think of this idea?

Thinking out loud from Jupyter's perspective (though not saying anything "official" yet :), I think we're mostly agnostic: I like the idea, but we're basically a pass-through for the code written in any language. We'd be most directly impacted by the editing needs and platform/browser support for the standard, but that tends to get sorted out relatively quickly if these things are accepted. So I don't think our opinion matters too much, though I love the idea :)

I think more important than Jupyter would be to hear from core Python folks: I know that while Python (3) accepts unicode identifiers, I not all unicode chars are allowed. For this to impact Python in a positive way, we'd need this new class of characters to be allowed as identifiers. In that regard, I don't know if the choice between your two proposed paths (new chars vs. combining) would matter to Python's choice of what is allowed to be a variable name.

I know @takluyver is fairly up to speed with these things in the Python world, perhaps you can comment?

stevengj · 2016-08-27T12:04:41Z

@fperez, most of the proposed new characters would be new Latin and Greek subscripts/superscripts, which would be in category Lm (Letter, modifier), and are accepted in Python 3 identifiers already (e.g. αₓ is already allowed in Python). So, new characters in the same category would presumably be allowed as Python identifiers.

Something like a subscript ⟂ or * would probably be in category Sm (Symbol, math), and these are not accepted as Python identifiers at the moment, so I doubt the subscript version will.

If we decide to go the combining-character route, that should be fine too, since Python 3 accepts combining marks in identifiers (e.g. for x̂).

(I really think that Python should expand the set of Unicode categories that it accepts as identifiers. It's crazy to me that x0 is a valid identifier but x₀ is not; ₀ is in category No "Number, other", and I would think that Python would treat this like any other number for identifiers.)

fperez · 2016-08-27T19:19:32Z

On Sat, Aug 27, 2016 at 5:04 AM, Steven G. Johnson <[email protected]

wrote:

(I really think that Python should expand the set of Unicode categories
that it accepts as identifiers. It's crazy to me that x0 is a valid
identifier but x₀ is not; ₀ is in category No "Number, other", and I
would think that Python would treat this like any other number.)

Agreed. The rule could be "must start with a character from , but afterwards can include ". Python already
forbids identifiers starting with numbers, so this would still restrict a
bare variable named "" (which would make an horror like "** == ** == ***2"
possible :). But it would allow those you suggest...

If this gains traction, we could try to work with the Python team on the
question, they are receptive to discussions driven by concrete use cases.

Carreau · 2016-08-28T18:26:27Z

We'd be most directly impacted by the editing needs and platform/browser support for the standard, but that tends to get sorted out relatively quickly if these things are accepted.

Chome still regularly render incorrectly the combining arrow of a vector on the next character instead of previous. The issue has been open for a year at least now.

Valid Python identifier

Object repr is a perfectly valid example where we (IPython) could make use of that without the need to be an identifier, but I agree it's beyond the scope of the proposal.

Otherwise I would thought the mathematical sup/subscript to be before the character they modify, more the a ZWJ, to be in between the 2 glyphs, but I don't know the standard in unicode.

What to you expect if you have <Caracter><a superscript><a subscript> should both be above each other ? if so should <Caracter><a subscript><a superscript> be normalized the same?

asmeurer · 2016-08-28T18:50:35Z

Maybe a font could chose to render them on top of each other with <combining subscript>a<ZWJ><combining superscript>b.

asmeurer · 2016-08-28T18:51:58Z

At any rate, that's the second time someone has implicitly assumed that this proposal includes support for superscript and subscript characters on top of each other, so this should be discussed in the proposal, even if we don't want to propose allowing that at all.

pkra · 2016-08-30T07:41:57Z

Pinging @pkra from MathJax: Peter, what do you think of this idea?

Thanks for cc'ing me @fperez. I'm not an expert on Unicode so I don't have much I have to say on this. There's an inherent tension between doing layout via Unicode (i.e., font rendering engines) when in the context of other layout engines (TeX, HTML+CSS, SVG etc). Combining characters are somewhat of a pain when you're doing layout (e.g., in HTML, split them into two spans -- what should happen?). They also pose an accessibility problem since assistive technologies often ignore non-ascii Unicode characters (especially with default settings) and since Unicode names have no official localization.

If the hope is to magically solve a layout problem, then I'd be skeptical. Something has to do the layout after all, and you only push this to the level of the font engines (which vary widely in quality across OSs and OS versions and even applications on the same OS, e.g., Windows ships several font engines).

The only proper opinion I have is: I would drop the "mathematical" part as there seems to be nothing mathematical about it -- it's just scripts.

stevengj · 2016-08-30T14:00:19Z

@Carreau, a modifier character in Unicode always comes after the character to be modified. e.g. to type x̂ you do x followed by U+0302.

@pkra, combining characters are not magic. Every modern editor, terminal, and browser already supports them. (Specific combining characters might not appear in certain settings, but that's a font problem.) And essentially all modern programming languages already accept non-Unicode identifiers, so that ship has sailed. The current situation is that you can make identifiers with some subscripts, e.g. αᵦ is already allowed in Python 3, but it only works with an arbitrary subset of Latin and Greek characters that have codepoints assigned. (e.g. you can do every superscript lower-case Latin letter except q.)

At the very least, for using super/subscripts in mathematical code, we should complete the set with the remaining Latin and Greek characters. (This could be done without any combining characters, just by adding new codepoints; the combining-character proposal is a more ambitious alternative.)

pkra · 2016-08-30T16:17:01Z

@pkra, Every modern editor, terminal, and browser already supports them.

Thanks. I'm aware of that. What I was trying to point out was that problems linger outside the sphere of Unicode.

To give an example, recently a publisher approached me about rendering issues with MathJax. Their content was trying to get a V-bar (V̅). Their source was MathML and was using combining characters for this. Unfortunately, that's not what MathML expects -- it has an <mover> construct for this -- and the rendering fell apart in MathJax (and also, for a different reason, in their PDF rendering).

So again, from the perspective of rendering beyond the scope of Unicode constructs, combining characters are already messy and I don't think adding to them helps these use cases.

I realize that this has little bearing on the proposal at hand. I only wanted to respond to @fperez question for comment from my arguably limited perspective.

stevengj · 2016-08-30T16:28:37Z

@pkra, I agree that combining Unicode-based math formatting with MathML or LaTeX is a recipe for trouble. But, as you say, this has little bearing on the current proposal, which is mainly aimed at programming languages (in which non-Unicode formatting is a non-starter).

@pkra

Emphasize that we don't advocate combining Unicode-base math formatting with MathML, as pointed out by @pkra in #3.

asmeurer · 2016-08-30T17:21:22Z

Are there typographical considerations where direct font support could do better than a naive renderer?

mpacer · 2016-08-30T17:40:22Z

Almost all kerning and (more generally) letter-placement considerations
will benefit from particular treatment at the font level. The only
renderers I've seen (occaisionally) do better than the default font kerning
are Adobe's Illustrator and InDesign renderers (which are also designed to
be highly customizable e.g., with ⌥+→ expanding the placement of two
characters slightly)

For a different project, I was trying to actually look up how browsers were
extracting and rendering these kinds of placement considerations (since
their solutions are demonstrably different than from the font default as
rendered by Illustrator)… I ended up getting no where with that project.
However, I'd be curious to see how browsers are handling subscripts and
whether that information is currently being baked into the font, I'm
assuming that it is.

Figuring out how to automatically decide that for a generic combining
character has a high potential to be a mess. It's not impossible, but I
think naïve rendering is not going to go well in the most general case.

On Tue, Aug 30, 2016 at 10:21 AM, Aaron Meurer [email protected]
wrote:

Are there typographical considerations where direct font support could do
better than a naive renderer?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACXg6N2cbTvGcbq_e-Cndl_BFc_bM2P9ks5qlGaSgaJpZM4JumHr
.

stevengj · 2016-08-30T18:26:07Z

Please put discussion of the technical implementation of combining characters in issue #1.

lambdafu · 2016-11-18T14:54:43Z

Maybe ask CERN for endorsement?

NAThompson · 2020-06-14T20:35:32Z

I can endorse and maybe with some effort can get institutional backing. This would be great for communicating the results of the PSLQ algorithm.

stevengj · 2020-06-15T02:06:48Z

Thanks, I should really get back to this proposal. What institution were you thinking of?

NAThompson · 2020-06-15T02:08:43Z

ORNL.

stevengj mentioned this issue Aug 27, 2016

Application to in-terminal math rendering #2

Open

stevengj added a commit that referenced this issue Aug 30, 2016

emphasizing that we are not advocating a replacement for MathML

b1c2e94

Emphasize that we don't advocate combining Unicode-base math formatting with MathML, as pointed out by @pkra in #3.

stevengj mentioned this issue Aug 30, 2016

Feasibility of new combining characters? #1

Open

pkra mentioned this issue Sep 1, 2016

Adds Julia-style latex->unicode tab completion ipython/ipython#6380

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endorsements #3

Endorsements #3

stevengj commented Aug 27, 2016 •

edited

Loading

fperez commented Aug 27, 2016

stevengj commented Aug 27, 2016 •

edited

Loading

fperez commented Aug 27, 2016

Carreau commented Aug 28, 2016

asmeurer commented Aug 28, 2016

asmeurer commented Aug 28, 2016

pkra commented Aug 30, 2016

stevengj commented Aug 30, 2016 •

edited

Loading

pkra commented Aug 30, 2016

stevengj commented Aug 30, 2016

asmeurer commented Aug 30, 2016

mpacer commented Aug 30, 2016

stevengj commented Aug 30, 2016

lambdafu commented Nov 18, 2016

NAThompson commented Jun 14, 2020

stevengj commented Jun 15, 2020

NAThompson commented Jun 15, 2020

Endorsements #3

Endorsements #3

Comments

stevengj commented Aug 27, 2016 • edited Loading

fperez commented Aug 27, 2016

stevengj commented Aug 27, 2016 • edited Loading

fperez commented Aug 27, 2016

Carreau commented Aug 28, 2016

asmeurer commented Aug 28, 2016

asmeurer commented Aug 28, 2016

pkra commented Aug 30, 2016

stevengj commented Aug 30, 2016 • edited Loading

pkra commented Aug 30, 2016

stevengj commented Aug 30, 2016

asmeurer commented Aug 30, 2016

mpacer commented Aug 30, 2016

stevengj commented Aug 30, 2016

lambdafu commented Nov 18, 2016

NAThompson commented Jun 14, 2020

stevengj commented Jun 15, 2020

NAThompson commented Jun 15, 2020

stevengj commented Aug 27, 2016 •

edited

Loading

stevengj commented Aug 27, 2016 •

edited

Loading

stevengj commented Aug 30, 2016 •

edited

Loading