Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support and test UTF-8 output #182

Open
avsm opened this issue Apr 19, 2019 · 9 comments
Open

support and test UTF-8 output #182

avsm opened this issue Apr 19, 2019 · 9 comments
Labels
Milestone

Comments

@avsm
Copy link
Contributor

avsm commented Apr 19, 2019

From @nojb:

UTF-8 support is missing.

See also previous discussion in #27

(this is tracking blockers to a new omd release from #174)

@avsm avsm added the feature label Apr 19, 2019
@XVilka
Copy link

XVilka commented Aug 19, 2019

@clecat I believe having UTF-8 is also important for mdx integration, since even English books often contain Unicode symbols for different reasons, e.g. mathematical or computer science symbols.

@aantron
Copy link

aantron commented Apr 23, 2020

Unicode symbols can appear in English in ordinary prose text. A simple example is fancy quotes. UTF-8 support is critical for practically anything outside source code, configuration files, and other computer files.

@nojb
Copy link
Contributor

nojb commented Apr 23, 2020

Currently, omd is byte-based so it will take (and spit back out) arbitrary byte sequences, including UTF-8.

@nojb
Copy link
Contributor

nojb commented Apr 23, 2020

"Proper" UTF-8 support would enforce the encoding and also take that into account when case or other normalization is needed.

@aantron
Copy link

aantron commented Apr 23, 2020

Ok, yes, thanks. I was misled by other tools and the title of this issue, but omd indeed wasn't the problem.

Perhaps this issue should be renamed to only "test...", or otherwise clarified.

@aantron
Copy link

aantron commented Apr 23, 2020

And similarly for #27.

@nojb
Copy link
Contributor

nojb commented Jun 20, 2020

Strictly speaking, this is not a blocker. The current byte-based approach is good enough for the vast majority of cases. I'll see if we can plug in a UTF-8 decoder easily for the 2.0 release, but if not full Unicode support will have to wait until after 2.0.

@nojb nojb closed this as completed Jun 20, 2020
@nojb nojb mentioned this issue Jun 20, 2020
@shonfeder shonfeder reopened this May 29, 2021
@shonfeder
Copy link
Collaborator

shonfeder commented May 29, 2021

Spec 175, which is currently disabled fails due to inaccurate handling of upper/lower case conversion on unicode URLs. So proper support for this issue is a blocker for #235, which is a blocker for the 2.0 release milestone.

Failing verification tests:

diff --git a/tests/spec-175.html b/tests/spec-175.html.new
index 93c6540..97ea23b 100644
--- a/tests/spec-175.html
+++ b/tests/spec-175.html.new
@@ -1 +1,2 @@
-<p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p>
+<p>[ΑΓΩ]: /φου</p>
+<p>[αγω]</p>
         git (internal) (exit 1)
(cd _build/default && /usr/bin/git --no-pager diff --no-index --color=always -u tests/spec-536.html tests/spec-536.html.new)
diff --git a/tests/spec-536.html b/tests/spec-536.html.new
index afe4557..8f8663f 100644
--- a/tests/spec-536.html
+++ b/tests/spec-536.html.new
@@ -1 +1 @@
-<p><a href="/url">Толпой</a> is a Russian word.</p>
+<p>[Толпой][Толпой] is a Russian word.</p>

From example

[ΑΓΩ]: /φου

[αγω]
.
<p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p>

https://github.com/madroach/omd/blob/master/tests/spec.txt#L2985-L2991

and

[Толпой][Толпой] is a Russian word.

[ТОЛПОЙ]: /url
.
<p><a href="/url">Толпой</a> is a Russian word.</p>

https://github.com/madroach/omd/blob/master/tests/spec.txt#L8072-L8078

@shonfeder shonfeder added this to the 2.0 milestone May 29, 2021
@shonfeder shonfeder added bug and removed feature labels May 29, 2021
@shonfeder
Copy link
Collaborator

Reclassified as a bug, since this is causing us to fail verification against the spec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants