Skip to content

Qi Meeting Nov 8 2024

Siddhartha Kasivajhula edited this page Nov 22, 2024 · 2 revisions

An Expander for Your Expander

Qi Meeting Nov 8 2024

Adjacent meetings: Previous | Up | Next

Summary

We continued developing the proof-of-concept for "deep" macros extending the compiler by defining map and foldl as such DSL extensions, with the ability to indicate DSL subexpressions so that they are properly expanded and compiled.

Background

Last time, we came up with a few options for syntax for indicating DSL subexpressions in the "deep" macro. In the interim, Sid started to modify the proof of concept to more closely resemble our needs and start to incorporate these options. We still needed to implement the translation of the specification so indicated to the actual expansion and compilation of these positions.

Continuous Deformation

The original POC defined a specific operation, the addition of two numbers, and it expected these two subexpressions to be DSL syntax. To make it easier to integrate into Qi, we had now modified the POC to allow us to define arbitrary operations, such as map and foldl, by providing a specification of the syntax of these forms in the manner we discussed last time.

This is the syntax we chose for the moment. The core language grammar is:

my-expr := (#%optimizable-app op:id c:my-expr-clause ...+)
my-expr-clause := ((~datum f) e:my-expr)
                  ((~datum e) e:racket-expr))

That is, each subexpression of the form would correspond to a clause specifying, via simply a leading f or e, whether that expression is to be considered a DSL expression (floe, in the case of Qi) or a Racket expression (expr).

Then using the "deep extension" macro, we would define new operations as something like this:

(define-optimizable-op
  (map [f f])
  (lambda (f)  ; receives all compiled subexpressions, in this case just one
    #`(lambda (vs)  ; single list arg
        (r:map #,f vs))))

At the moment, the pattern binding f in the syntax of the new form is unused and simply indicates that an expression is expected there (and that, in this case, it is a DSL expression, as indicated by the leading f). The f on the following line is the actual compiled syntax that the syntax expression on the following lines will receive from the compiler, at the code generation stage. So we could write it this way to make what's happening more clear:

(define-optimizable-op
  (map [f flow-exp])
  (lambda (compiled-flow-exp)  ; receives all compiled subexpressions, in this case just one
    #`(lambda (vs)  ; single list arg
        (r:map #,compiled-flow-exp vs))))

Since there will be as many expressions passed to the lambda as there are in the syntax specification, we could potentially improve the macro to make that a little more convenient, and support defining the form simply as:

(define-optimizable-op
  (map [f f])
  #`(lambda (vs)  ; single list arg
      (r:map #,f vs))) ; this `f` is the compiled form of the `f` two lines above

Translating the Subexpression Specification

Now, we just need a way to translate (map [f f]) to (#%optimizable-app op:id [f f]) -- that is we need to expand it correctly to the #%optimizable-app core form with DSL and host expressions explicitly indicated, so that Syntax Spec can then use the core language specification to fully expand this form. And secondly, we need to then compile the DSL positions in the expanded core language.

We'll do these in the reverse order.

Compilation

Originally, the compiler was hardcoded to expect the syntax (#%optimizable-app op e1 e2) where e1 and e2 were expected to be DSL expressions. For reference:

(begin-for-syntax
  (define (compile-my-expr e)
    (syntax-parse e
      #:datum-literals (#%optimizable-app)
      [n:number
       #'n]
      [(#%optimizable-app op e1 e2)
       (define e1^ (compile-my-expr #'e1))
       (define e2^ (compile-my-expr #'e2))
       (define info (syntax-local-value #'op))
       (match info
         [(op-info codegen)
          (codegen e1^ e2^)])])))

But now we have the flexible syntax that allows us to indicate which positions are DSL vs host expressions, reproduced here for reference:

my-expr := (#%optimizable-app op:id c:my-expr-clause ...+)
my-expr-clause := ((~datum f) e:my-expr)
                  ((~datum e) g:racket-expr))

To compile this new syntax, we just need to modify the compiler to compile the DSL subexpressions, while leaving the other expressions alone, before handing them all to the code generation function provided by the macro author. Additionally, since we're trying to make this resemble Qi at this point, we replace the number in the core language with an identifier instead.

(begin-for-syntax
  (define (compile-my-expr-clause c)
    (syntax-parse c
      [((~datum f) e) (compile-my-expr #'e)]
      [((~datum e) e) #'e]))

  (define (compile-my-expr e)
    (syntax-parse e
      #:datum-literals (#%optimizable-app)
      [n:id
       #'n]
      [(#%optimizable-app op c ...)
       (define es^ (map compile-my-expr-clause (attribute c)))
       (define info (syntax-local-value #'op))
       (match info
         [(op-info codegen)
          (apply codegen es^)])])))

Now, working backwards, that leaves the expansion part -- that is, we still need to translate the user-provided syntax to an appropriate use of the #%optimizable-app core form (which we are compiling here).

Expansion

In order to translate (map [f f]) to (#%optimizable-app op:id [f f]), we can focus our attention on the op-transformer syntax-phase function, as that is responsible for expanding the form.

Originally, it was hardcoded to expect specifically two subexpressions, both of which would be compiled as DSL expressions:

(begin-for-syntax
  (define (op-transformer info)
    (syntax-parser
     [(_ e1 e2) #`(#%optimizable-app #,info e1 e2)]))

The syntax-parser here is the actual "macro" responsible for transforming a use of the my-map form into a use of the #%optimizable-app core form.

We need to rewrite it to accept the syntax specification (i.e. [f f] in the above example) as an argument so that it can generate an equal number of corresponding subforms in expanding to the #%optimizable-app core form (in accordance with the grammar of the core language we wrote down earlier).

We first tried:

(begin-for-syntax
  (define (op-transformer info spec)
    (syntax-parse spec
      [([tag arg-name] ...)
       (syntax-parser
         [(_ e ...) #`(#%optimizable-app #,info [tag e] ...)])]))

As before, it evaluates to a syntax parser, that is, a (-> syntax? syntax?) lambda that is the actual "macro," but it first uses syntax parse to obtain pattern bindings against the provided argument spec, the one that tells us which arguments are floe vs which ones are expr. The syntax-parser here receives the actual source syntax, the same as before, including the syntactic arguments provided in the use of the my-map form. For instance, it may receive (my-map sqr).

This resulted in:

; scratch.rkt:54:48: syntax: incompatible ellipsis match counts for template
;   at: ...
;   in: ((tag e) ...)

That turned out to be because during testing, we had declared an extra argument in the definition of my-map:

(define-optimizable-op
  (my-map [f f] [e arg])  ; ← HERE
  (lambda (f)  ; single list arg
    #`(lambda (vs)
        (map #,f vs))))

((my-lang (my-map sqr)) (list 1 2 3))

Removing the extra [e arg] fixed the issue, and we got the expected output of:

'(1 4 9)

But we felt that the error message in this case wasn't helpful. It would be better if it could alert us to the presence of an unexpected number of arguments in the use of my-map. Ordinarily, it is Syntax Spec that takes care of detecting invalid syntax and raising appropriate errors, but it only does that for the DSL core language. For DSL macros (which extend DSL expansion), it is the macro's responsibility to do error reporting, just like for ordinary macros that extend Racket expansion. So we made the following modification:

(begin-for-syntax
  (define (op-transformer info spec)
    (syntax-parse spec
      [([tag arg-name] ...)
       (syntax-parser
         [(_ e ...) (if (= (length (attribute e))
                           (length (attribute arg-name)))
                        #`(#%optimizable-app #,info [tag e] ...)
                        (raise-syntax-error #f
                                            "Wrong number of arguments!"
                                             this-syntax))])]))

Now,

(define-optimizable-op
  (my-map [f f] [e arg])
  (lambda (f)  ; single list arg
    #`(lambda (vs)
        (map #,f vs))))

((my-lang (my-map sqr)) (list 1 2 3))

… with the extraneous argument in the definition, produces:

; scratch.rkt:82:10: my-map: Wrong number of arguments!
;   in: (my-map sqr)

and:

(define-optimizable-op
  (my-map [f f])
  (lambda (f)  ; single list arg
    #`(lambda (vs)
        (map #,f vs))))

((my-lang (my-map sqr sqr)) (list 1 2 3))

… with the extraneous argument in the use, produces:

; scratch.rkt:82:10: my-map: Wrong number of arguments!
;   in: (my-map sqr sqr)

map: undefined??

In earlier testing, Sid had attempted to use a prefix r: in the codegen for my-map, anticipating that the DSL and host language forms would have the same name, and using the prefix to allow the codegen to fall back to the Racket version. But it kept complaining:

; scratch.rkt:89:9: r:map: unbound identifier

… with every permutation of (require (prefix-in r: racket/list)) (require (for-syntax (prefix-in r: racket/list))), (require (for-template (prefix-in racket/list))), and for-meta.

This turned out to be because map isn't in racket/list but in racket/base 😭. (require (prefix-in r: racket/base)) fixed the error.

Generalizing to foldl

Next, we tried to implement foldl to test whether our implementation correctly handled a nontrivial argument spec with more than one argument.

We defined foldl this way:

(define-optimizable-op
  (my-foldl [f f] [e init])
  (lambda (f init)
    #`(lambda (vs)  ; single list arg
        (r:foldl #,f #,init vs))))

… and used it:

((my-lang (my-foldl + 0)) (list 1 2 3))

This gave us an error:

; scratch.rkt:56:52: ?: expected one of these literal symbols: `f' or `e'
;   at: (#%host-expression 0)
;   in: ((#%host-expression 0) (#%host-expression 0))
;   Source locations:
;   scratch.rkt:90:22

This was weird. To debug it, we first wanted to understand whether something was going wrong during expansion or during compilation. To determine this, we made the following change in the syntax specification of the DSL:

  (host-interface/expression
    (my-lang e:my-expr)
    #''e
    ;; (compile-my-expr #'e)
    ))

That is, after expansion, instead of compiling the expression using the DSL compiler we had written, we simply "compiled" the expression to Racket by producing the expanded syntax object as a quoted value (note the double quote) – a standard debugging technique used in writing macros.

We then evaluated:

(my-lang (my-foldl + 0))

which gave us:

'(#%optimizable-app info (f +) ((#%host-expression 0) (#%host-expression 0)))

So evidently, the problem was in expansion, since it already exhibits the duplicate (#%host-expression 0) in the expanded form of the source expression.

We needed to narrow it down further – was the problem happening during expansion of the defined DSL macro (i.e. foldl) to the core form (i.e. #%optimizable-app) via the define-optimizable-op "deep extension" macro we wrote, or was it happening after that, during expansion of the core form by Syntax Spec?

To determine this, we added a displayln at the point where op-transformer produces its final output.

(begin-for-syntax
  (define (op-transformer info spec)
    (syntax-parse spec
      [([tag arg-name] ...)
       (syntax-parser
         [(_ e ...) (if (= (length (attribute e))
                           (length (attribute arg-name)))
                        (begin (displayln
                                #`(#%optimizable-app #,info [tag e] ...)) ; ← HERE
                               #`(#%optimizable-app #,info [tag e] ...))
                        (raise-syntax-error #f
                                            "Wrong number of arguments!"
                                            this-syntax))])]))

Along the way to generating the error, this printed:

#<syntax:scratch.rkt:59:34 (#%optimizable-app info (f +) (e 0))>

So it appeared that our define-optimizable-op was working correctly after all.

After doing a few tests, Michael determined that the problem was stemming from the use of the same name for the pattern binding as for the preceding datum used in matching in the DSL's syntax spec:

  (nonterminal my-expr-clause
    ((~datum f) e:my-expr)
    ((~datum e) e:racket-expr))

Changing the e in the second clause to some other name resolved the problem:

  (nonterminal my-expr-clause
    ((~datum f) e:my-expr)
    ((~datum e) g:racket-expr)) ; e → g

It turns out that Syntax Spec was creating a pattern binding for the initial e here due to a bug. Something like, we can only escape ... specifically using a (... form (as in (... ...)), rather than arbitrary pattern variables like (... e), and there doesn't seem to be a standard way to escape pattern variables in this manner.

We registered an issue on Syntax Spec, and for our purposes for the moment, we can simply avoid using the same name here. When we did that, we got the expected output of:

((my-lang (my-foldl + 0)) (list 1 2 3))

;=> 6

Next Steps

(Some of these are carried over from last time)

  • Improve the define-optimizable-op macro to avoid writing the wrapping lambda that receives the argument bindings in the indicated syntax.
  • Incorporate the extension scheme into the Qi codebase, starting with map and foldl.
  • Resolve the issue with bindings being prematurely evaluated.
  • Implement more fusable stream components like append, drop, and member.
  • Provide a runtime and indicate floe positions by enriching the syntax of #%deforestable
  • Implement the generic scheme for promoting position-oriented Qi macro definitions to handle templates
  • Update the benchmarking infrastructure (vlibench and also the basic ones in qi-sdk)
  • Define an extension scheme for deforesting custom list APIs.
  • Fix the bug in using bindings in deforestable forms like range
  • Finalize and document guiding principles for Git commits in the developer wiki
  • Revisit extending deforestation to multi-valued settings.
  • Write a proof-of-concept compiling Qi to another backend (such as threads or futures), and document the recipe for doing this.
  • Ask Ben to review tests from the perspective of Frosthaven Manager.
  • Review Cover's methodology for checking coverage.
  • Document the release model in the user docs and announce the performance regression (and the remedy) to users.
  • Improve unit testing infrastructure for deforestation.
  • Discuss and work out Qi's theory of effects and merge the corresponding PR.
  • Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
  • Deforest other racket/list APIs via qi/list
  • Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
  • Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.

Attendees

Dominik, Michael, Sid

Clone this wiki locally