Unicode Handling `character` Combinators #130

j-mie6 · 2022-11-22T20:30:14Z

The new Parsley 4 lexer introduced various degrees of support for Unicode. It would be nice to extend this newfound support across to regular character parsing, in place of just using string. As such the following combinators should be added:

charUtf16(c: Int): Parsley[Int]
satisfyUtf16(p: Int => Boolean): Parsley[Int]
stringOfManyUtf16(p: Parsley[Int]): Parsley[String]
stringOfSomeUtf16(p: Parsley[Int]): Parsley[String]

They use the specific name qualifier of Utf16 because it may be desirable in future to additionally add UTF-8 compliant combinators too, given it is also a popular encoding, albeit one not supported natively in Scala/Java. All of these combinators should be careful to provide good caret and position updating in the presence of multi-char codepoints (see #129)

The text was updated successfully, but these errors were encountered:

j-mie6 · 2023-06-24T23:27:51Z

charUtf16 released as codePoint. Need better names for the other three

j-mie6 · 2023-07-08T23:17:17Z

It's probably best to move these all to a parsley.unicode package, which can also support Int versions of all the other parsers. This will mean, hilariously, deprecating codePoint in the 4.5.0 pre-5 deprecation spree.

j-mie6 added enhancement New feature or request minor This change wouldn't affect existing parsers labels Nov 22, 2022

j-mie6 added this to the Parsley 4.4 milestone Jul 8, 2023

j-mie6 mentioned this issue Jul 8, 2023

Parsley 4.4.0 #207

Merged

55 tasks

j-mie6 closed this as completed in #207 Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode Handling `character` Combinators #130

Unicode Handling `character` Combinators #130

j-mie6 commented Nov 22, 2022

j-mie6 commented Jun 24, 2023

j-mie6 commented Jul 8, 2023

Unicode Handling character Combinators #130

Unicode Handling character Combinators #130

Comments

j-mie6 commented Nov 22, 2022

j-mie6 commented Jun 24, 2023

j-mie6 commented Jul 8, 2023

Unicode Handling `character` Combinators #130

Unicode Handling `character` Combinators #130