-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can whitespace be ignored? #10
Comments
There is nothing builtin, because packrat parsers like Esrap typically operate without the otherwise common separation into a lexer and parser stage. I typically address the problem by defining two variants of most rules: one that assumes that leading and trailing whitespace has been handled and a second one that invokes the first rule but skips over any trailing whitespace. I started to write a chapter for the manual about this topic which follows below. I hope that helps. Parsers often consist of a lexer component in addition to the actual parser. The lexer breaks the input into tokens dealing with whitespace and comments. The parser then consumes this token stream. This approach has the advantage that the parser component does not have to with "details" such as whitespace and comments which are usually rather uniform across the whole grammar but there are downside as well. For example, the Python grammar depends on whitespace and a lexer stage prevents later stages from accessing comments. There are also context sensitive tokens such as Esrap does not suffer from the downsides of having a separate lexer component. This comes, however, at the price of having to deal with things like whitespace and comments in the actual grammar. Fortunately, a macro along the following lines combined with a convention for writing rules can make the inconvenience almost disappear: (defmacro deftoken (name expression &body options)
(let ((name/skippable (alexandria:symbolicate name '#:/s))
(name/maybe-skippable (alexandria:symbolicate name '#:/?s)))
`(progn
(esrap:defrule ,name ,expression ,@options)
(esrap:defrule ,name/skippable
(and ,name skippable)
(:function first))
(esrap:defrule ,name/maybe-skippable
(and ,name (esrap:? skippable))
(:function first))))) this can be used as (esrap:defrule whitespace
(+ (or #\Space #\Tab #\Newline))
(:constant nil))
(esrap:defrule comment
(and "/*" (* (not "*/")) "*/")
(:function second)
(:text t))
(esrap:defrule skippable
(+ (or whitespace comment)))
(deftoken type
(and (alpha-char-p character) (* (alphanumericp character)))
(:text t))
(deftoken identifier
(and (alpha-char-p character) (* (alphanumericp character)))
(:text t))
(deftoken equals
#\=
(:text t))
(deftoken value
(+ (digit-char-p character))
(:text t)
(:function parse-integer))
(deftoken variable
(and type/s identifier/?s equals/?s value)) Note: the final token of a rule expression should not except trailing skippables. This is necessary to make the rule usable in different contexts. Note that comments can still be captured when they occur at "interesting" locations (e.g. documentation comments): (deftoken keyword-class
"class")
(esrap:defrule class
(and keyword-class/s identifier/?s #\{ "..." #\}))
(esrap:defrule compilation-unit
(* (or whitespace comment variable class))
(:lambda (top-level-nodes)
;; Remove whitespace results. Could for example associate comment
;; nodes to class or function nodes following them in the input
;; text.
(remove nil top-level-nodes))) In this grammar, top-level comments can be captured and e.g. associated with class or function nodes but "token rules" lead to whitespace and comments being ignored everywhere else. (esrap:parse 'compilation-unit "class foo {...}
/*comment and a newline*/
int a=1 /*comment*/
int b = 5") The parser.common-rules library provides a |
Many thanks for the excellent reply. I completely missed the parser.common-rules. I was thinking of modifying the library, but with your deftoken I will be able to work out a solution. |
I'm migrating a fairly big parser from python and is there any way of ignoring a certain terminal?
Being able to have the parser ignore whitespace would be extremely useful.
The text was updated successfully, but these errors were encountered: