Skip to content

Compiler Architecture

JonathanAldrich edited this page Feb 26, 2013 · 5 revisions

Structure

Phases

  • Lexer - wyvern.tools.lexer
    • Input from a Reader
    • Output: get/peek interface
    • Tokens, Newline, Indent/Dedent
    • Issues
      • Supporting an escape for newlines
      • Should we ignore newlines inside parentheses? Handled in parsing stage 1.
      • Maintaining comments so they can be associated with the next syntactic construct
      • Adding character numbers
      • Useful to have a TokenStream interface that the Lexer implements
  • Parsing Stage 1 - Parse into an uninterpreted tree (RawAST)
    • Input: Stream of tokens (right now a Lexer, eventually a TokenStream)
    • Output: RawAST
    • Issues
      • Preserve line numbers
  • Parsing Stage 2
    • Input: RawAST
    • Output: TypedAST
    • Features
      • Keyword extensibility and keyword-based parsing
      • Symbol resolution, and types of symbols, is done as part of parsing
        • Issue: need to resolve types of all declarations before parsing any definitions. Type resolution may be recursive and thus must be done lazily/on-demand.