Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import lark grammar written in one python project into another #1397

Open
manjrekarom opened this issue Mar 26, 2024 · 6 comments
Open

Import lark grammar written in one python project into another #1397

manjrekarom opened this issue Mar 26, 2024 · 6 comments
Labels

Comments

@manjrekarom
Copy link

manjrekarom commented Mar 26, 2024

What is your question?

How do I import lark grammar written in one python project into another.
E.g.
project A has A/grammar.lark file
project B can import A
But

// file called B/superset.py with inline grammar
...
%import A.grammar.SOME_TERMINAL

fails with FileNotFoundError: [Errno 2] No such file or directory: 'A/grammar.lark'

Explain what you're trying to do, and what is obstructing your progress.
I am not sure if there is a nice way to do this. We are able to %import common.<TERMINAL> so I would like to think there should be a way to do this.

@manjrekarom manjrekarom changed the title Import lark grammar written in one project into another Import lark grammar written in one python project into another Mar 26, 2024
@MegaIng
Copy link
Member

MegaIng commented Mar 26, 2024

lark has facilities for this, namely import_paths.

If "project A" means an importable package, you can use the FromPackageLoader helper class:

from lark.load_grammar import FromPackageLoader

Lark(... , import_paths=[FromPackageLoader('A', [''])

'' can be replaced by a relative path if your grammar files aren't in the top level of the package's folder. You would then import this with %import grammar.SOME_TERMINAL, not %import A.grammar.SOME_TERMINAL.

If A is just a folder somewhere, you can instead provide a path inside of import_paths: Lark(..., import_paths['/some/absolute/path'])

@manjrekarom
Copy link
Author

Thanks, this works. Is there a way to also import all symbols from the grammar?
E.g. replace this

%import grammar.ABC
%import grammar.DEF

with

%import grammar.*

@mbhall88
Copy link

I would also love to know if there is a way to import everything from another grammar

@erezsh
Copy link
Member

erezsh commented Aug 14, 2024

@mbhall88 I created a test PR that implements this functionality. You're welcome to give it a try and tell me if it works!

#1446

Here's an example of using it:

lark = Lark(r"""
start: "a" WS "a"

%import common.*
"""
)

@mbhall88
Copy link

So I have tried it out. I must admit, I am a complete beginner with Lark and so I'm not sure if my error is due to that, or due to something else.

What I a trying to do is create a grammar and parser for Snakemake which is a DSL built on top of python. i.e., any python syntax is valid snakemake syntax, and then there is snakemake-specific syntax on top of that.

I want to keep the Snakemake grammar definition separate from the Python grammar, hence why I stumbled across this issue.

Here is a small example of what I was trying to do (using lark installed from the linked branch (import_star))

from lark import Lark

lark = Lark(
    r"""
%import python.*

start: file_input
            
ruledef: "rule" NAME ":" inputs outputs
inputs: "input:" files
outputs: "output:" files
files: (FILE_NAME)+

FILE_NAME: /[a-zA-Z0-9_\.\/]+/

"""
)

snakefile = """x = 42

rule foo:
    input: 'foo.txt'
"""


def parse_snakemake_file():
    return lark.parse(snakefile)

When I try to import the parse_snakemake_file function and run it I get the following

p = snakemake_parser.parse_snakemake_file()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/michael/Projects/snakemake-parser/src/snakemake_parser/__init__.py", line 27, in parse_snakemake_file
    return lark.parse(snakefile)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/lark.py", line 655, in parse
    return self.parser.parse(text, start=start, on_error=on_error)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/parser_frontends.py", line 104, in parse
    return self.parser.parse(stream, chosen_start, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/parsers/earley.py", line 280, in parse
    to_scan = self._parse(lexer, columns, to_scan, start_symbol)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/parsers/xearley.py", line 152, in _parse
    to_scan = scan(i, to_scan)
              ^^^^^^^^^^^^^^^^
  File "/home/michael/Projects/snakemake-parser/.venv/lib/python3.12/site-packages/lark/parsers/xearley.py", line 125, in scan
    raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
lark.exceptions.UnexpectedCharacters: No terminal matches ' ' in the current parser context, at line 1 col 2

x = 42
 ^
Expected one of:
        * __ANON_5
        * __ANON_13
        * LPAR
        * __ANON_18
        * EQUAL
        * VBAR
        * DOT
        * MORETHAN
        * SEMICOLON
        * __ANON_6
        * PERCENT
        * __ANON_17
        * __ANON_2
        * __ANON_16
        * COLON
        * __ANON_21
        * AMPERSAND
        * CIRCUMFLEX
        * COMMA
        * IN
        * __ANON_11
        * __ANON_10
        * SLASH
        * __ANON_12
        * LESSTHAN
        * __ANON_22
        * __ANON_8
        * IF
        * __ANON_7
        * __ANON_20
        * NOT
        * __ANON_15
        * AND
        * _NEWLINE
        * MINUS
        * __ANON_23
        * __ANON_3
        * __ANON_14
        * PLUS
        * __ANON_1
        * __ANON_9
        * LSQB
        * OR
        * STAR
        * IS
        * AT
        * __ANON_19
        * __ANON_4

again, this could be my misunderstanding - I wasn't certain what to use for start as that doesn't seem to be defined in the python grammer?

Also, I am happy to move this to a separate issue so as not to clutter this issue.

@erezsh
Copy link
Member

erezsh commented Aug 19, 2024

@mbhall88 Feel free to open a new discussion/issue.

Anyway, when importing grammars, you still have call all the relevant %ignore statements from the root grammar. (in this case, all the ignores inside python.g)

start is just the default root name. You can change it by providing the "start" parameter to Lark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants