-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Try to) fix Utf8 #54
Conversation
I was not sure on how to handle high/low surrogates so I added a config flag, but in most cases they should be interpreted as invalid |
This may also be related with #40 |
Why to check twice instead of checking just on utf8ToMultibyte and replacing it if it's invalid? |
Where? |
You can detect it only when displaying. |
You have to check when reading because the next characters shouldn't be consumed if the codepoint is malformed. The check is just used to ungetc the characters. In this way malformed utf8 can be stored without losing its original value, at least for what I have tried. |
Now I am adding validation for user input |
The aim of this pr is to add a way to validate utf8 and to handle "gracefully" invalid sequences, by replacing them with a placeholder character.