Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Try to) fix Utf8 #54

Merged
merged 5 commits into from
Apr 2, 2021
Merged

(Try to) fix Utf8 #54

merged 5 commits into from
Apr 2, 2021

Conversation

bynect
Copy link
Contributor

@bynect bynect commented Apr 1, 2021

The aim of this pr is to add a way to validate utf8 and to handle "gracefully" invalid sequences, by replacing them with a placeholder character.

@bynect bynect changed the title (Try) to fix Utf8 Try to fix Utf8 Apr 1, 2021
@bynect bynect changed the title Try to fix Utf8 (Try to) fix Utf8 Apr 1, 2021
@bynect
Copy link
Contributor Author

bynect commented Apr 1, 2021

I was not sure on how to handle high/low surrogates so I added a config flag, but in most cases they should be interpreted as invalid

@bynect
Copy link
Contributor Author

bynect commented Apr 1, 2021

This may also be related with #40

@arthurbacci
Copy link
Owner

Why to check twice instead of checking just on utf8ToMultibyte and replacing it if it's invalid?

@bynect
Copy link
Contributor Author

bynect commented Apr 1, 2021

Why to check twice instead of checking just on utf8ToMultibyte and replacing it if it's invalid?

Where?

@arthurbacci
Copy link
Owner

You can detect it only when displaying.

@bynect
Copy link
Contributor Author

bynect commented Apr 1, 2021

You can detect it only when displaying.

You have to check when reading because the next characters shouldn't be consumed if the codepoint is malformed. The check is just used to ungetc the characters. In this way malformed utf8 can be stored without losing its original value, at least for what I have tried.

@bynect
Copy link
Contributor Author

bynect commented Apr 1, 2021

Now I am adding validation for user input

@arthurbacci arthurbacci merged commit 7953eec into arthurbacci:master Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants