(Try to) fix Utf8 #54

bynect · 2021-04-01T14:04:31Z

The aim of this pr is to add a way to validate utf8 and to handle "gracefully" invalid sequences, by replacing them with a placeholder character.

bynect · 2021-04-01T14:07:29Z

I was not sure on how to handle high/low surrogates so I added a config flag, but in most cases they should be interpreted as invalid

bynect · 2021-04-01T14:15:54Z

This may also be related with #40

arthurbacci · 2021-04-01T16:25:36Z

Why to check twice instead of checking just on utf8ToMultibyte and replacing it if it's invalid?

bynect · 2021-04-01T16:27:45Z

Why to check twice instead of checking just on utf8ToMultibyte and replacing it if it's invalid?

Where?

arthurbacci · 2021-04-01T16:30:33Z

You can detect it only when displaying.

bynect · 2021-04-01T16:36:52Z

You can detect it only when displaying.

You have to check when reading because the next characters shouldn't be consumed if the codepoint is malformed. The check is just used to ungetc the characters. In this way malformed utf8 can be stored without losing its original value, at least for what I have tried.

bynect · 2021-04-01T17:01:53Z

Now I am adding validation for user input

bynect added 4 commits April 1, 2021 15:12

Add utf8 validation

32ccf8f

Add strict_utf8

80be725

Merge branch 'master' of github.com:ArthurBacci64/Teditor into utf8-fix

632b70a

Refactor utf8 functions

0c8d23e

bynect changed the title ~~(Try) to fix Utf8~~ Try to fix Utf8 Apr 1, 2021

bynect changed the title ~~Try to fix Utf8~~ (Try to) fix Utf8 Apr 1, 2021

Add validation to user input

53c2e82

arthurbacci merged commit 7953eec into arthurbacci:master Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Try to) fix Utf8 #54

(Try to) fix Utf8 #54

bynect commented Apr 1, 2021

bynect commented Apr 1, 2021

bynect commented Apr 1, 2021

arthurbacci commented Apr 1, 2021

bynect commented Apr 1, 2021

arthurbacci commented Apr 1, 2021

bynect commented Apr 1, 2021 •

edited

Loading

bynect commented Apr 1, 2021

(Try to) fix Utf8 #54

(Try to) fix Utf8 #54

Conversation

bynect commented Apr 1, 2021

bynect commented Apr 1, 2021

bynect commented Apr 1, 2021

arthurbacci commented Apr 1, 2021

bynect commented Apr 1, 2021

arthurbacci commented Apr 1, 2021

bynect commented Apr 1, 2021 • edited Loading

bynect commented Apr 1, 2021

bynect commented Apr 1, 2021 •

edited

Loading