-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support and test UTF-8 output #182
Comments
@clecat I believe having UTF-8 is also important for mdx integration, since even English books often contain Unicode symbols for different reasons, e.g. mathematical or computer science symbols. |
Unicode symbols can appear in English in ordinary prose text. A simple example is fancy quotes. UTF-8 support is critical for practically anything outside source code, configuration files, and other computer files. |
Currently, |
"Proper" UTF-8 support would enforce the encoding and also take that into account when case or other normalization is needed. |
Ok, yes, thanks. I was misled by other tools and the title of this issue, but omd indeed wasn't the problem. Perhaps this issue should be renamed to only "test...", or otherwise clarified. |
And similarly for #27. |
Strictly speaking, this is not a blocker. The current byte-based approach is good enough for the vast majority of cases. I'll see if we can plug in a UTF-8 decoder easily for the 2.0 release, but if not full Unicode support will have to wait until after 2.0. |
Spec 175, which is currently disabled fails due to inaccurate handling of upper/lower case conversion on unicode URLs. So proper support for this issue is a blocker for #235, which is a blocker for the 2.0 release milestone. Failing verification tests: diff --git a/tests/spec-175.html b/tests/spec-175.html.new
index 93c6540..97ea23b 100644
--- a/tests/spec-175.html
+++ b/tests/spec-175.html.new
@@ -1 +1,2 @@
-<p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p>
+<p>[ΑΓΩ]: /φου</p>
+<p>[αγω]</p>
git (internal) (exit 1)
(cd _build/default && /usr/bin/git --no-pager diff --no-index --color=always -u tests/spec-536.html tests/spec-536.html.new)
diff --git a/tests/spec-536.html b/tests/spec-536.html.new
index afe4557..8f8663f 100644
--- a/tests/spec-536.html
+++ b/tests/spec-536.html.new
@@ -1 +1 @@
-<p><a href="/url">Толпой</a> is a Russian word.</p>
+<p>[Толпой][Толпой] is a Russian word.</p> From example [ΑΓΩ]: /φου
[αγω]
.
<p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p> https://github.com/madroach/omd/blob/master/tests/spec.txt#L2985-L2991 and [Толпой][Толпой] is a Russian word.
[ТОЛПОЙ]: /url
.
<p><a href="/url">Толпой</a> is a Russian word.</p> https://github.com/madroach/omd/blob/master/tests/spec.txt#L8072-L8078 |
Reclassified as a bug, since this is causing us to fail verification against the spec |
From @nojb:
See also previous discussion in #27
(this is tracking blockers to a new omd release from #174)
The text was updated successfully, but these errors were encountered: