report line and column of syntax error when parsing #36

mewmew · 2019-12-04T21:28:23Z

To help troubleshoot the cause of a syntax errors when parsing, it would be really helpful if the parser generated by textmapper reported both line and column of syntax errors when parsing.

Currently only the line number is reported, and this creates an additional step for users of the parser to figure out the exact cause of the parse error. One recent such example is llir/llvm#105 (comment), the extract of which is included here for completeness.

By @dannypsnl:

By the way, would you like to use official IR Parser by submodule LLVM-IR-parser and mapping ast only? textmapper didn't provide a reasonable result for parsing error. I didn't know which step it stuck.

All I got was: syntax error at line 1

Here is the test code:
package asm

import (
	"testing"

	"github.com/llir/ll/ast"
)

func TestParse(t *testing.T) {
	testCode := `!7 = !DIExpression(DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_LLVM_convert, 32, DW_ATE_signed)`
	_, err := ast.Parse("", testCode)
	if err != nil {
		t.Error(err)
	}
}

To get a better parse error, we ended up breaking the line into multiple lines, resulting in syntax error at line 5, which was more helpful. The same could be achieved with a line:column pair when reporting syntax errors.

ref: llir/llvm#105 (comment)

!7
=
!DIExpression(DW_OP_LLVM_convert,
16,
DW_ATE_unsigned,
DW_OP_LLVM_convert,
32,
DW_ATE_signed
)

Note, this might be me failing to use functionality already included in Textmapper. I simply use the generated parser and don't try to do anything fancy handling errors. So perhaps this information is already available?

Cheers,
Robin

The text was updated successfully, but these errors were encountered:

inspirer · 2019-12-04T21:54:10Z

I used to have column tracking in parsers written in Java but it turned out we needed it so rarely that I dropped the support for it. Instead, I optionally track the offset of the current line (or rather of the token start line) in the lexer:

tokenLineOffset = true // enables this

Besides, columns are very weird as there are multiple interpretations for this number: bytes, unicode runes, or JavaScript offsets (measured in utf-16 code units). All browsers use the last one.

I can embed the line offset into the SyntaxError type if you need it.

mewmew · 2019-12-04T22:49:32Z

Besides, columns are very weird as there are multiple interpretations for this number: bytes, unicode runes, or JavaScript offsets (measured in utf-16 code units). All browsers use the last one.

This being in Go, I'd probably go with Rune count (where tabs count as a single rune, since the tab width is otherwise subjective).

I can embed the line offset into the SyntaxError type if you need it.

That would be very helpful.

mewmew mentioned this issue Dec 4, 2019

update test cases to LLVM 9.0 llir/llvm#105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report line and column of syntax error when parsing #36

report line and column of syntax error when parsing #36

mewmew commented Dec 4, 2019

inspirer commented Dec 4, 2019

mewmew commented Dec 4, 2019

report line and column of syntax error when parsing #36

report line and column of syntax error when parsing #36

Comments

mewmew commented Dec 4, 2019

inspirer commented Dec 4, 2019

mewmew commented Dec 4, 2019