Best way to define lexis rules for ASCII-case-insensitive keywords? #17
-
I'm interested in using Lady Deirdre to build LSP servers for the Doom modification ecosystem, where it is orthodox for DSLs to make identifiers and keywords insensitive to ASCII case. The only two ways I can see for representing this in LD: #[derive(LexisToken, Clone, Copy, PartialEq, Eq)]
#[repr(u8)]
pub enum Token {
Eoi = 0,
Mismatch = 1,
#[rule(['a', 'A'] ['u', 'U'] ['t', 'T'] ['o', 'O'])]
KwAuto,
} #[derive(LexisToken, Clone, Copy, PartialEq, Eq)]
#[repr(u8)]
#[define(A = ['a', 'A'])]
#[define(U = ['u', 'U'])]
#[define(T = ['t', 'T'])]
#[define(O = ['o', 'O'])]
pub enum Token {
Eoi = 0,
Mismatch = 1,
#[rule(A U T O)]
KwAuto,
} The former seems error-prone and bad as keywords get longer (one of the languages in my sights has the keyword |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Thank you for your interest in Lady Deirdre. LD does not support case-insesitive grammar currently, and I agree with your point that the current workarounds seem inelegant. Case-insesitive grammars are quite widespread, and they should be supported by LD for sure. I will introduce the new operator |
Beta Was this translation helpful? Give feedback.
-
@jerome-trc The feature is available for review in the The following code: use lady_deirdre::lexis::{SourceCode, Token, TokenBuffer};
#[derive(Token, Copy, Clone, PartialEq, Eq, Debug)]
#[repr(u8)]
enum Tok {
EOI = 0,
Mismatch = 1,
#[rule("|")]
Sep,
#[rule(i("Foo"))] // Case-insensitive
Foo,
#[rule(i("Bar"))] // Case-insensitive
Bar,
#[rule("baz")] // Case-sensitive
Baz,
}
let buf = TokenBuffer::<Tok>::from("foo|Bar|BAR|baz|BAZ");
for chunk in buf.chunks(..) {
println!("{:?}: {:?}", chunk.token, chunk.string);
} outputs:
Let me know if it works for you. |
Beta Was this translation helpful? Give feedback.
-
The quick response is appreciated. I gave 771b108 a try and can confirm the case insensitivity works exactly as intended, but I also encountered a panic:
Here is a minimum reproduction: #[derive(LexisToken, Debug, Clone, Copy, PartialEq, Eq)]
#[repr(u8)]
#[define(A = ['a', 'A'])]
pub enum T {
Eoi = 0,
Mismatch = 1,
#[rule(A)]
KwA,
#[rule(i("b"))]
KwB,
} |
Beta Was this translation helpful? Give feedback.
@jerome-trc The feature is available for review in the
issue-18-case-insensitive-grammars
branch.The following code:
outputs: