Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Handling character Combinators #130

Closed
j-mie6 opened this issue Nov 22, 2022 · 2 comments · Fixed by #207
Closed

Unicode Handling character Combinators #130

j-mie6 opened this issue Nov 22, 2022 · 2 comments · Fixed by #207
Labels
enhancement New feature or request minor This change wouldn't affect existing parsers
Milestone

Comments

@j-mie6
Copy link
Owner

j-mie6 commented Nov 22, 2022

The new Parsley 4 lexer introduced various degrees of support for Unicode. It would be nice to extend this newfound support across to regular character parsing, in place of just using string. As such the following combinators should be added:

  • charUtf16(c: Int): Parsley[Int]
  • satisfyUtf16(p: Int => Boolean): Parsley[Int]
  • stringOfManyUtf16(p: Parsley[Int]): Parsley[String]
  • stringOfSomeUtf16(p: Parsley[Int]): Parsley[String]

They use the specific name qualifier of Utf16 because it may be desirable in future to additionally add UTF-8 compliant combinators too, given it is also a popular encoding, albeit one not supported natively in Scala/Java. All of these combinators should be careful to provide good caret and position updating in the presence of multi-char codepoints (see #129)

@j-mie6 j-mie6 added enhancement New feature or request minor This change wouldn't affect existing parsers labels Nov 22, 2022
@j-mie6
Copy link
Owner Author

j-mie6 commented Jun 24, 2023

charUtf16 released as codePoint. Need better names for the other three

@j-mie6 j-mie6 added this to the Parsley 4.4 milestone Jul 8, 2023
@j-mie6 j-mie6 mentioned this issue Jul 8, 2023
55 tasks
@j-mie6
Copy link
Owner Author

j-mie6 commented Jul 8, 2023

It's probably best to move these all to a parsley.unicode package, which can also support Int versions of all the other parsers. This will mean, hilariously, deprecating codePoint in the 4.5.0 pre-5 deprecation spree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor This change wouldn't affect existing parsers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant