-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse quoted token like String #392
Comments
(I realize |
Relatedly, do the content of strings need to be unescaped (e.g. |
This is unfortunately not possible with BNFC. The token types |
Thanks for the clarification @andreasabel ! Do you think it would be difficult to add either a directive to the grammar such as |
A workaround would be that you record a patch that you could apply each time after BNFC has run (via patching the Makefile).
One design for this would be #267, but I welcome spinning more design ideas! |
I tried patching the lexer like so based on the string handling: diff --git a/src/MyPackage/Lex.x b/src/MyPackage/Lex.x
index 1234567..1234567 100644
--- a/src/MyPackage/Lex.x
+++ b/src/MyPackage/Lex.x
@@ -25,19 +25,19 @@ $u = [. \n] -- universal: any character
\, | \[ | \] | \{ | \} | \: | \[ \]
:-
$white+ ;
@rsyms
{ tok (\p s -> PT p (eitherResIdent TV s)) }
\' ([$u # [\' \\]] | \\ [\' \\ f n r t]) * \'
- { tok (\p s -> PT p (eitherResIdent T_MyToken s)) }
+ { tok (\p s -> PT p (eitherResIdent T_MyToken $ unescapeInitTail s)) }
$l $i*
{ tok (\p s -> PT p (eitherResIdent TV s)) }
\" ([$u # [\" \\ \n]] | (\\ (\" | \\ | \' | n | t | r | f)))* \"
{ tok (\p s -> PT p (TL $ unescapeInitTail s)) }
$d+
{ tok (\p s -> PT p (TI s)) }
$d+ \. $d+ (e (\-)? $d+)?
@@ -117,18 +117,19 @@ unescapeInitTail :: Data.Text.Text -> Data.Text.Text
unescapeInitTail = Data.Text.pack . unesc . tail . Data.Text.unpack
where
unesc s = case s of
'\\':c:cs | elem c ['\"', '\\', '\''] -> c : unesc cs
'\\':'n':cs -> '\n' : unesc cs
'\\':'t':cs -> '\t' : unesc cs
'\\':'r':cs -> '\r' : unesc cs
'\\':'f':cs -> '\f' : unesc cs
'"':[] -> []
+ '\'':[] -> []
c:cs -> c : unesc cs
_ -> []
-------------------------------------------------------------------
-- Alex wrapper code.
-- A modified "posn" wrapper.
-------------------------------------------------------------------
data Posn = Pn !Int !Int !Int
This mostly works. However, without the patch I am able to use a grammar that can match two forms:
OR
And With the patch, the one case that now fails is that I'm guessing I didn't correctly modify the lexer to do what strings are doing. Any ideas? (I can rig up a minimal reproducible test if that would help.) |
Ah, nevermind, I took a second look at the code, saw the definition of Well, I guess I should probably leave these comments up here in case anyone else makes the same mistake. Correct patch: diff --git a/src/MyPackage/Lex.x b/src/MyPackage/Lex.x
index 1234567..1234567 100644
--- a/src/MyPackage/Lex.x
+++ b/src/MyPackage/Lex.x
@@ -25,19 +25,19 @@ $u = [. \n] -- universal: any character
\, | \[ | \] | \{ | \} | \: | \[ \]
:-
$white+ ;
@rsyms
{ tok (\p s -> PT p (eitherResIdent TV s)) }
\' ([$u # [\' \\]] | \\ [\' \\ f n r t]) * \'
- { tok (\p s -> PT p (eitherResIdent T_MyToken s)) }
+ { tok (\p s -> PT p (T_MyToken $ unescapeInitTail s)) }
$l $i*
{ tok (\p s -> PT p (eitherResIdent TV s)) }
\" ([$u # [\" \\ \n]] | (\\ (\" | \\ | \' | n | t | r | f)))* \"
{ tok (\p s -> PT p (TL $ unescapeInitTail s)) }
$d+
{ tok (\p s -> PT p (TI s)) }
$d+ \. $d+ (e (\-)? $d+)?
@@ -117,18 +117,19 @@ unescapeInitTail :: Data.Text.Text -> Data.Text.Text
unescapeInitTail = Data.Text.pack . unesc . tail . Data.Text.unpack
where
unesc s = case s of
'\\':c:cs | elem c ['\"', '\\', '\''] -> c : unesc cs
'\\':'n':cs -> '\n' : unesc cs
'\\':'t':cs -> '\t' : unesc cs
'\\':'r':cs -> '\r' : unesc cs
'\\':'f':cs -> '\f' : unesc cs
'"':[] -> []
+ '\'':[] -> []
c:cs -> c : unesc cs
_ -> []
-------------------------------------------------------------------
-- Alex wrapper code.
-- A modified "posn" wrapper.
-------------------------------------------------------------------
data Posn = Pn !Int !Int !Int
|
Judging from playing with the Test program, the printer might also need adjusting to restore the quotes around MyToken. I'll have to take a look at that at whatever point I need the printer. |
I am having trouble figuring out how I can write a
token
like the built-inString
but with different quotation marks and, in my Abs.hs result, only get the content inside the quotes likeString
does. If I create atoken
with some kind of quotes it seems that whatever interprets theAbs
members has to remove them, whereas this is not necessary forString
as far as I've noticed. Is there a way to achieve this?My first thought was to define the token as just the stuff inside the quotes, then use it a la
MyQuotedType . MyQuotedType ::= "'" MyToken "'";
But this causes other keywords to be parsed (or perhaps lexed? I'm not terribly familiar with the distinction) asMyToken
even though they are not preceded by the quote; and that breaks (such that it won't even parse) code that was successfully parsing in the version where the quotes are part of the token and get manually unquoted in the interpreter.The text was updated successfully, but these errors were encountered: