Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce HTML5 mode #2

Closed
bamtor opened this issue Jul 19, 2016 · 4 comments
Closed

Introduce HTML5 mode #2

bamtor opened this issue Jul 19, 2016 · 4 comments

Comments

@bamtor
Copy link
Contributor

bamtor commented Jul 19, 2016

(originally reported by [email protected])

When used in conjunction with Chromium, Blink only supports ISO-2022-JP. The detection of other 7-bit encodings and other non-HTML5 encodings should be disabled in CED. We can handle it by introducing HTML5 mode.

@bamtor
Copy link
Contributor Author

bamtor commented Jul 26, 2016

9012c0a handles 7-bit encodings other than ISO-20220-JP so that the result will be ASCII-7BIT instead.

@bamtor bamtor closed this as completed Jul 26, 2016
@jungshik
Copy link

jungshik commented Jul 29, 2016

Sorry that I made a misleading/confusing bug report. Other than UTF-7, ISO-2022-{KR,CN} and HZ-GB are treated as replacement encoding per WHATWG encoding spec. So, if they're detected by CED, Blink will convert the whole input to a single character U+FFFD.

Given this, I think we'd better leave alone the detection of those encodings and let Blink deal with them (convert to U+FFFD).

UTF-7 is a bit tricky. I'm filing a bug against WHATWG encoding spec so that it's treated the same way as ISO-2022-{KR,CN}, HZ-GB. See whatwg/encoding#68

@bamtor
Copy link
Contributor Author

bamtor commented Aug 1, 2016

Let me revert the change. I believe, however, HTML5_MODE is still valid for sanitizing encoding names as filed in #1 I'll keep it and use it for that purpose.

@bamtor bamtor reopened this Aug 1, 2016
@JinsukKim
Copy link
Collaborator

e21eb6a kinds of handled this issue on CED side by returning LATIN if the detected encoding is not supported by WHATWG. This is to make the behavior conform to the standard as well as leave the document intact in such situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants