Skip to content

Qi Meeting Nov 15 2024

Siddhartha Kasivajhula edited this page Nov 23, 2024 · 3 revisions

Getting Ahead of Ourselves

Qi Meeting Nov 15 2024

Adjacent meetings: Previous | Up | Next

Summary

We continued iterating on the proof-of-concept for "deep macros" extending deforestation to custom list operations. We managed to get it to a clean and user-friendly form that resembles the interface provided by syntax-parse, and it now feels complete and ready to incorporate into Qi. We tried to discern what the next steps might be once that's done, and it's not so clear yet.

Background

Last week, we improved the deep macro to support a flexible syntax for indicating nonterminals to facilitate proper expansion, in addition to specifying code generation at the final stage of compilation for the form being defined. The syntax was still a bit idiosyncratic, however. For example, we define the map operation this way:

(define-optimizable-op
  (map [f f])
  (lambda (f)
    #`(lambda (vs)
        (r:map #,f vs))))

A Familiar Syntax

Avoiding the Wrapping Lambda

The first thing that we addressed, which we had pointed out last time, is the presence of so many redundant references to the binding f. We avoided the lambda (making it implicit) on the third line by making a simple change to define-optimizable-op:

(define-syntax define-optimizable-op
  (syntax-parser
-    [(_ (name spec ...) codegen-f)
+    [(_ (name spec ...) codegen)
+    #:with ([typ arg] ...) #'(spec ...)
+    #:with codegen-f #'(lambda (arg ...) codegen)
     #'(begin

         ;; capture the codegen in an instance of
         ;; the compile time struct
         (define-syntax info
           (op-info codegen-f))

         (define-dsl-syntax name my-expr-macro
           (op-transformer #'info #'(spec ...))))]))

This simplified the definition syntax to:

(define-optimizable-op
  (map [f f])
  #`(lambda (vs)
      (r:map #,f vs))))

… making the (map [f f]) effectively a binding form.

It initially seemed as though we might lose some flexibility by losing the layer of abstraction that is the lambda. But after discussing further we felt there would be other equivalent ways to retain such flexibility. In any case, it seemed prudent to design the interface so that the most common case (i.e. defining a macro and specifying its codegen inline) is simple, rather than try to design to make edge cases (such as abstracting the definitions over functionality common to them) simple at the expense of the common cases (while keeping them in mind, of course).

Avoiding Unquoting

This was an improvement, but all the quoting and unquoting still seemed cumbersome. We considered what it would be like to avoid syntax quoting altogether, so that the definition could be:

(define-optimizable-op
  (map [f f])
  (lambda (vs)
    (r:map f vs))))

Although this was possible, after discussing it, we felt that this would be less flexible for the same reason as syntax-rules vs syntax-parse, that is, in the latter case, we can employ any expression that evaluates to syntax and needn't indicate the expansion inline.

So we finally decided to eliminate just unquoting, so that these binding references are treated as pattern bindings rather than variable bindings. To do this we had to convert the variable binding to a pattern binding using a simple trick, as in this case we know that the variable binding refers to a syntax object.

(define-syntax define-optimizable-op
  (syntax-parser
    [(_ (name spec ...) codegen)
     #:with ([typ arg] ...) #'(spec ...)
-    #:with codegen-f #'(lambda (arg ...) codegen)
+    #:with codegen-f #'(lambda (arg ...)
+                         (with-syntax ([arg arg] ...)
+                           codegen))
     #'(begin

         ;; capture the codegen in an instance of
         ;; the compile time struct
         (define-syntax info
           (op-info codegen-f))

         (define-dsl-syntax name my-expr-macro
           (op-transformer #'info #'(spec ...))))]))

Now, the definition of map would be:

(define-optimizable-op
  (map [f f])
  #'(lambda (vs)
      (r:map f vs))))

This syntax was familiar as it was the same as that of syntax-parse. We agreed this felt like the right syntax for these "deep macros."

Getting Ahead of Ourselves

We all agreed that the POC was now complete and that incorporating it into the Qi codebase would be the next step, and felt that from that future vantage point, the next steps would become more clear.

But we still briefly discussed what those next steps towards making deforestation extensible might be.

Deforesting Custom List Operations

Once the extension POC is integrated into Qi in its current form, we would be able to extend the language by indicating expansion to a Qi core form (like usual Qi macros) -- but one specially designated for extension as it doesn't itself define a Racket runtime. So in addition, using the same "deep macro," we would be able to specify a Racket runtime for this new form. Additionally, leveraging this designated core form allows us in principle to add optimizations to the compiler (whose scope is restricted to the Qi core language for theoretical and practical reasons discussed in the past).

But so far, we haven't talked in detail about, nor does the POC include, any way to add such optimizations.

We imagined two possible options for how this might be done.

Augment the deep macro to include optimizations

This would be the most straightforward way, where we support an additional clause in the deep macro that would allow us to indicate optimizations as syntax transformations (e.g. syntax-parse clauses) that would be undertaken in a certain compiler pass (how do we indicate which one?).

This would allow the optimizations to be associated with the binding being defined, allowing us to use the same compile-time datatype that we defined as the vehicle to carry the user-defined Racket runtime through to the code generation stage of compilation.

Define a mechanism by which individual compiler passes may be extended

There could be a separate define-optimization macro that adds such syntax rules to advertised compiler passes. This would allow the addition of arbitrary optimizations that aren't coupled to a specific form. As it would not be tied to the user-defined binding, it would require the maintenance of a separate compile-time table to keep track of such optimizations. This approach could risk enabling gratuitous mutation of language semantics, however.

Producers, Transformers, and Consumers

We've talked about producers, transformers, and consumers as three broad categories in sequence-based operations that seem to encapsulate a lot, or perhaps all, of the list-oriented operations we are interested in deforesting. It could be that if we can define an adequate "core language" for deforestation in these terms, we could expose convenience macros to users that would allow them to indicate the essential vagaries of their sequence operations to generic extension macros resembling define-producer, define-transformer, and define-consumer.

It's interesting that we seem to be moving in the direction of implementing deforestation in such a way that it exhibits a wide variety of surface forms (like those in racket/list) that reduce to a smaller core (the deforestation runtime primitives). It seems to beg the question, are we just embedding one "deep" DSL inside another? Does that suggest anything about how to design it?

Deforestation for Bespoke Datatypes

Our current mechanism for deforesting user-defined operations is a means of extension. That is, it extends functionality that is already in the core language. In the past we've considered supporting deforestation of arbitrary sequences, including user-defined types that aren't lists (e.g. say, dataframes that are used in data science). We don't currently have this ability in the Qi core, so it's not obvious how this functionality could be supported in the future.

In order to use our existing mechanism to deforest such datatypes, we would need some form of generic deforestation to be part of the Qi core language, perhaps as another core form #%sequence introduced for this purpose, analogous to #%deforestable, the latter of which may be specific to lists and our current implementation of stream fusion. This new form could perhaps have a corresponding compiler pass that deforests generic sequences (like Clojure's transducers). Alternatively, our existing deforestation pass itself could be made generic to support this use case, via the existing #%deforestable form. Either of these options would be a natural way to generalize the language itself so that we could then leverage our existing extension mechanism to extend deforestation to custom (sequence) datatypes.

Another option we've talked about in the past is to treat such optimizations of bespoke types not as extension of Qi core functionality, but as the addition of new, distinct, functionality. For instance, hypothetically, if there were optimizations for dataframes that don't reduce to optimizations of list-like sequences, then we would not be able to "extend" this core functionality to express the optimization. Instead, it becomes a problem of composing new functionality rather than extending existing functionality. When we last discussed this, we felt that extension was a more natural way to frame this particular problem than composition, and it remains to be seen if this assumption holds up when we get to the point of trying to optimize operations on arbitrary datatypes. But we are certainly getting ahead of ourselves here.

Before We Get There

For now, while we develop and reveal the deforestation infrastructure, we felt that we could define and rely on private APIs until the right interface in light of the above considerations becomes clear.

S-Expressions Are Bad!

Dominik is working on a cryptography project in Python where he has had the good fortune of encountering a data format that uses S-expressions (or as Sid likes to call them, symexes) to encode keys by mixing plain text and binary data 😄. Although he is usually unapologetically a traditionalist, sneering even at Racket's convention of using square brackets to improve readability, he had to begrudingly admit based on this experience that it's possible for his beloved S-expressions to be pretty terrible after all. Naturally, his reaction was to parse these weird S-expressions into normal S-expressions first before operating on them further. Likely next step: write a Racket interpreter in Python and then do the project in Racket.

Next Steps

(Some of these are carried over from last time)

  • Incorporate the extension scheme into the Qi codebase, starting with map and foldl.
  • Implement more fusable stream components like drop, append, and member.
  • Define a way to indicate optimizations for custom list APIs.
  • Update the benchmarking infrastructure (vlibench and also the basic ones in qi-sdk)
  • Resolve the issue with bindings being prematurely evaluated.
  • Fix the bug in using bindings in deforestable forms like range
  • Finalize and document guiding principles for Git commits in the developer wiki
  • Revisit extending deforestation to multi-valued settings.
  • Write a proof-of-concept compiling Qi to another backend (such as threads or futures), and document the recipe for doing this.
  • Ask Ben to review tests from the perspective of Frosthaven Manager.
  • Review Cover's methodology for checking coverage.
  • Document the release model in the user docs and announce the performance regression (and the remedy) to users.
  • Improve unit testing infrastructure for deforestation.
  • Discuss and work out Qi's theory of effects and merge the corresponding PR.
  • Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
  • Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
  • Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.

Attendees

Dominik, Michael, Sid

Clone this wiki locally