Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

or clause get ignored when the first clause is complicated #11

Closed
ailisp opened this issue Jun 11, 2020 · 4 comments
Closed

or clause get ignored when the first clause is complicated #11

ailisp opened this issue Jun 11, 2020 · 4 comments
Labels

Comments

@ailisp
Copy link

ailisp commented Jun 11, 2020

A short snippet to reproduce:

(ql:quickload '(esrap parse-float))
(in-package :esrap)

(defrule real-or-symbol (or real symbol))

(defrule real (and (? (or "+" "-")) (* (digit-char-p character))
                   (? (and "." (* (digit-char-p character))))
                   (? (and (or "e" "E") (? (or "+" "-")) (+ (digit-char-p character)))))
  (:lambda (list)
    (parse-float:parse-float (text list))))

(defrule symbol-char (or (and #\\ character) (get-macro-character character)))

(defrule symbol (+ symbol-char)
  (:lambda (list)
    (intern (text list))))

Then try:

ESRAP> (parse 'real-or-symbol "3.1e-2")
0.031
NIL
T
ESRAP> (parse 'symbol "aaa")
|aaa|
NIL
T
ESRAP> (parse 'real-or-symbol "aaa")
Error:

At

  aaa
  ^ (Line 1, Column 0, Position 0)

In context REAL:

While parsing REAL. Problem:

  The production
  
    #\a
  
  does not satisfy the predicate DIGIT-CHAR-P.

Expected:

     the character + (PLUS_SIGN)
  or the character - (HYPHEN-MINUS)
  or the character . (FULL_STOP)
  or the character E (LATIN_CAPITAL_LETTER_E)
  or the character e (LATIN_SMALL_LETTER_E)
  or any character satisfying DIGIT-CHAR-P
   [Condition of type ESRAP-PARSE-ERROR]

Why it was not continue try symbol when it's not a real? Thanks!

@scymtym
Copy link
Owner

scymtym commented Jun 11, 2020

The real rule is not correct. In general, to investigate why esrap behaves in an unexpected way, trace the rule or rules in question. In this case, trace real-or-symbol and the rules invoked by it:

(esrap:trace-rule 'real-or-symbol :recursive t)

Trying the failing input with the traced rules reveals the problem:

(esrap:parse 'real-or-symbol "aaa")
1: REAL-OR-SYMBOL 0[aaa]?
 2: REAL 0[aaa]?
 2: REAL 0-0 -> ("")
1: REAL-OR-SYMBOL 0-0 -> ("")

The real rule is tried first and succeeds at offset 0 without consuming any input, then the parse fails because of the leftover input. This is because the basic structure of the real rule is (and (? …) (? …) (? …)) which is a sequence of three optional parts.

The parser.common-rules library has rules for various numeric literals. Maybe you can find inspiration there or even use the library.

As a final note, please don't use symbols in the common-lisp package as rule names. This causes hard to spot bugs when some other code doing the same is loaded into the same image as your code. Future versions of esrap may disallow these rule names for that reason.

@ailisp
Copy link
Author

ailisp commented Jun 11, 2020

Got it! Thanks for the detailed analysis and both trace-rule and common-rules libs are indeed helpful!

As a final note, please don't use symbols in the common-lisp package as rule names. This causes hard to spot bugs when some other code doing the same is loaded into the same image as your code. Future versions of esrap may disallow these rule names for that reason.

Good to know that

@ailisp ailisp closed this as completed Jun 11, 2020
@scymtym scymtym pinned this issue Jun 11, 2020
@ailisp
Copy link
Author

ailisp commented Jun 12, 2020

Hi @scymtym ! Inspired by parser.common-rules, I get it work by adding real-decimals and real-scientific. rename to my-real to avoid potential confliction with cl:real


(defrule real-decimals (and (* (digit-char-p character)) "." (* (digit-char-p character))))

(defrule real-scientific (and (* (digit-char-p character)) (? ".") (* (digit-char-p character))
                              (or "e" "E") (? (or "+" "-")) (+ (digit-char-p character))))

(defrule my-real (and (? (or "+" "-")) (or real-scientific real-decimals))
  (:lambda (list)
    (parse-float (text list))))

Also parser.common-rules isn't mentioned in readme or homepage, I think it would be really helpful to mention it as solid examples and reference.

And there's a minor bug, i think, exist in float-literal/rational and my-real:

COMMON-RULES> (parse 'float-literal/rational ".")
0
NIL
T
COMMON-RULES> (parse 'float-literal/rational ".e5")
0
NIL
T
COMMON-RULES> (parse 'float-literal ".")
0.0d0
NIL
T

Expected result is reject . or .e5. In my example, it could be hack of listing . and ".e" (+ digit) also as symbols, appear early in real-or-symbol rule. Is there a better way to fix directly in float-literal?

Answer myself: trying a few different cases, this works (need to re-organize it to abstract common parts)

(defrule real (and (? (or "+" "-"))
                   (or (and (+ (digit-char-p character)) "." (* (digit-char-p character))
                            (? (and (or "e" "E") (? (or "+" "-")) (+ (digit-char-p character)))))
                       (and (* (digit-char-p character)) "." (+ (digit-char-p character))
                            (? (and (or "e" "E") (? (or "+" "-")) (+ (digit-char-p character)))))
                       (and (+ (digit-char-p character))
                            (or "e" "E") (? (or "+" "-")) (+ (digit-char-p character)))))
  (:lambda (list)
    (parse-float (text list))))

@scymtym
Copy link
Owner

scymtym commented Jun 12, 2020

And there's a minor bug, i think, exist in float-literal/rational and my-real: [...]

Thank you for the report. The issue should now be fixed in the master branch of the parser.common-rules library. I also extended the unit test significantly, so hopefully i got it right this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants