Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document encoding in the manual #52

Open
glebm opened this issue Aug 20, 2017 · 0 comments
Open

Document encoding in the manual #52

glebm opened this issue Aug 20, 2017 · 0 comments

Comments

@glebm
Copy link
Contributor

glebm commented Aug 20, 2017

Things like:

  1. The .waxeye grammar file is encoded as UTF-8.
  2. The generated runtime files are encoded as UTF-8.
  3. All runtimes use their native string types (C uses char *). Positions reported by the parser are offsets indices of the native string type.
  4. The parser operates on Unicode codepoints, parser input must be well-formed (decodable to a sequence of Unicode codepoints).
  5. Waxeye does not handle Unicode normalization. If you care about this, normalize (e.g. NFC-normalize) both the grammar file and the parser inputs.
  6. Case-insensitive literals are ASCII-only.

Parser runtimes that support Unicode: C (TBD), JavaScript (#47), Ruby, Racket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant