Skip to content

Qi Compiler Sync Sept 22 2023

Siddhartha Kasivajhula edited this page Sep 29, 2023 · 3 revisions

Threading a Needle in a Haystack

Qi Compiler Sync Sept 22 2023

Adjacent meetings: Previous | Up | Next [None]

Summary

It started innocently enough. We were hoping to fix the deforestation optimization so that it only applies to threading with right chirality (i.e. the ~>> form) instead of always, but found ourselves deep in a rabbit hole pondering the consequences of returning false or some other return value from compiler optimization rules when they aren't applicable to the input syntax. Along the way, a mysterious stale module loading bug had taken up residence and we convinced it to go away, and we managed to fix the chirality issue when we eventually emerged, wiser, from the rabbit hole.

Background

Last time, we completed implementing foldr as a stream terminator for the deforestation optimization, with good performance. We also discovered that deforestation was being applied to all threading expressions, but it is only correct to do it for right-threading. We decided to remedy this, this time.

A Ghostly Stale Module

But before we could get to the fun stuff, we discovered that running the benchmarks in the Qi SDK wasn't working anymore, with the error:

$ make profile-competitive
cd qi-sdk/profile/nonlocal; racket report-competitive.rkt
require: namespace mismatch;
 reference to a module that is not instantiated
  module: "/Users/siddhartha/work/qi/qi-lib/flow/impl.rkt"
  phase: 0
make: *** [profile-competitive] Error 1

We ran make clean, and even manually cleaned out all the /compiled folders, ensuring that there were no compiled binaries remaining:

$ find . |grep zo
$ find . |grep dep

But that still didn't get rid of the error.

We remembered something that Michael had mentioned in a previous meeting, something to the effect of, if packages are installed from a directory path via --link and then developed locally, that it's generally fine but could run into problems if source files are moved around or if there are other changes to the package directory structure. That was the case here since the directory structure in the compiler branch is quite different from the main branch, and the referenced module in the error message is one that's only present on the main branch.

So we removed qi-lib and qi-sdk and reinstalled them, and that worked! The ghostly stale module had evaporated from the fabric of the filesystem.

Chirality calamity

Now, back to the good stuff. We looked at the over-eager deforestation rewrite rule in the compiler:

(define (deforest-rewrite stx)
  (syntax-parse stx
    [((~datum thread) _0:non-fusable ...
                      f:fusable-list-operation ...+
                      g:fusable-fold-operation
                      _1 ...)
     #:with fused (generate-fused-operation (syntax->list #'(f ... g)))
     #'(thread _0 ... fused _1 ...)]
    [((~datum thread) _0:non-fusable ... f:fusable-list-operation ...+ _1 ...)
     #:with fused (generate-fused-operation (syntax->list #'(f ...)))
     #'(thread _0 ... fused _1 ...)]
    [_ #f]))

It matches all uses of thread but it's only appropriate to match forms derived from the use of the "right-threading" form ~>>.

In Qi, ~>> is not a core form but a macro, whose only purpose is to rewrite to the core threading form except with a chirality property attached to each component flow, indicating right-chirality. Left-chiral means values are passed as arguments in the leading positions, and right-chiral means they are passed in trailing positions.

So in the above rule, we just needed to narrow the matching to specifically those uses where the chirality syntax property was set and had the value right.

We decided that the right place to detect the syntax property was not here (it was "too late") but in the relevant syntax classes responsible for matching, like this one:

(define-syntax-class fusable-list-operation
  #:attributes (f next)
  #:datum-literals (#%host-expression #%partial-application)
  (pattern (#%partial-application
            ((#%host-expression (~literal map))
             (#%host-expression f)))
    #:attr next #'map-cstream-next)
  (pattern (#%partial-application
            ((#%host-expression (~literal filter))
             (#%host-expression f)))
    #:attr next #'filter-cstream-next))

Well, but before fixing it, we thought we should be smart about it and write some failing unit tests so that we could validate the fix.

Spooky Action at a Distance

We wrote these unit tests to ensure that using left-threading with map, filter and foldr should fail:

(check-exn exn:fail?
           (thunk
            ((☯ (~> (map sqr)))
             (list 1 2 3 4 5)))
           "(map) doforestation should only be done for right threading")
(check-exn exn:fail?
           (thunk
            ((☯ (~> (filter odd?)))
             (list 1 2 3 4 5)))
           "(filter) doforestation should only be done for right threading")
(check-exn exn:fail?
           (thunk
            ((☯ (~> (foldr + 0)))
             (list 1 2 3 4 5)))
           "(foldr) doforestation should only be done for right threading")

... and ran them, expecting them to fail, since, of course, we hadn't yet implemented the fix.

But we instead saw:

$ make test-flow
racket -y qi-test/tests/flow.rkt
465 success(es) 0 failure(s) 0 error(s) 465 test(s) run

We verified that in fact, an expression like this was failing as it should (but not as we were expecting):

> ((☯ (~> (map sqr)))
   (list 1 2 3 4 5))
; map: contract violation
;   expected: procedure?
;   given: '(1 2 3 4 5)

We investigated the expansions using the Macro Stepper:

(#%expression (☯ (~> (filter odd?))))

→  [Macro transformation]

(#%expression
 (let ()
   (qi0->racket
    (#%partial-application
     ((#%host-expression filter)
      (#%host-expression odd?))))))

... so it looked like deforestation wasn't being applied.

We looked at the matching rule in deforest-rewrite above, and convinced ourselves that it really should be getting applied:

((~datum thread) _0:non-fusable ... f:fusable-list-operation ...+ _1 ...)

Zero or more non-fusable expressions followed by one or more fusable operations followed by zero or more expressions -- which seemed like it should match (☯ (~> (filter odd?))), but apparently wasn't.

To validate our assumption that this wasn't working, we wrote these compiler-level unit tests that expected deforested output:

(check-equal? (syntax->datum
               (deforest-rewrite
                 #'(thread (#%partial-application
                            ((#%host-expression filter)
                             (#%host-expression odd?))))))
              '(thread
                (esc
                 (λ (lst)
                   ((cstream->list (inline-compose1 (filter-cstream-next odd?) list->cstream-next)) lst)))))
(check-equal? (syntax->datum
               (deforest-rewrite
                 #'(thread (#%partial-application
                            ((#%host-expression map)
                             (#%host-expression sqr))))))
              '(thread
                (esc
                 (λ (lst)
                   ((cstream->list (inline-compose1 (map-cstream-next sqr) list->cstream-next)) lst)))))

... but running these tests showed that they were passing!

So evidently, the rule would match correctly but apparently wasn't getting hit.

Curiouser and curiouser

This took us further down the rabbit hole as we looked for where deforest-rewrite was being called. It was this:

(define (deforest-pass stx)
  (find-and-map/qi (fix deforest-rewrite)
                   stx))

A ha! It must have something to do with our find-and-map/qi syntax traversal function, whose purpose is to traverse the input syntax and apply a transformation (in this case, the deforestation optimization rule) at every position where it applies, leaving other positions alone.

That looks like this:

(define (find-and-map f stx)
  ;; f : syntax? -> (or/c syntax? #f)
  (match stx
    [(? syntax?) (let ([stx^ (f stx)])
                   (or stx^ (datum->syntax stx
                              (find-and-map f (syntax-e stx))
                              stx
                              stx)))]
    [(cons a d) (cons (find-and-map f a)
                      (find-and-map f d))]
    [_ stx]))

(define (find-and-map/qi f stx)
  ;; #%host-expression is a Racket macro defined by syntax-spec
  ;; that resumes expansion of its sub-expression with an
  ;; expander environment containing the original surface bindings
  (find-and-map (syntax-parser [((~datum #%host-expression) e:expr) this-syntax]
                               [_ (f this-syntax)])
                stx))

Now this was something we'd written long ago and hoped never to have to deal with again, yet, the ability to traverse a syntax tree while performing an arbitrary mapping had turned out to be pretty useful, so here we were.

Light at the End of the Rabbit Hole

After we spent a lot of time going down rabbit holes, a main point we seemed to take away was that it was necessary for the mapping function to return false when the input syntax did not match a pattern to be mapped (as hinted at by one of the comments in the code there), which, currently, our fixed-point-finding optimization-rule-applier was not doing. We experimented with returning #f in the default case of deforest-rewrite instead of the input syntax (i.e. via the special identifier this-syntax) without using the fixed-point logic, but nothing especially panned out until we noticed that our deforestation rewrite rule specifically applies to threading, but in the expansion we saw above with the macro stepper, reproduced here:

(#%expression (☯ (~> (filter odd?))))

→  [Macro transformation]

(#%expression
 (let ()
   (qi0->racket
    (#%partial-application
     ((#%host-expression filter)
      (#%host-expression odd?))))))

... there was no mention of threading!

It turned out that what was happening was that an earlier compilation pass -- the normalization pass -- was rewriting (~> f) to simply f, since threading a single flow is equivalent to simply applying the flow itself. Then, in the deforestation pass, nothing further would be done because there was no longer any threading happening. This is of course desirable behavior here.

This was our first experience with what will probably be an important consideration from here on out -- interaction between different compiler passes.

At least, one good thing we retrieved from the rabbit hole was the observation that we might need to revisit the fixed-point-finding method, since it currently does not return false in non-matching cases, and that might cause premature termination of syntax tree traversal so that nested positions where deforestation would have applied would not be reached.

We modified the tests to thread more than one flow to avoid being normalized:

(check-exn exn:fail?
           (thunk
            ((☯ (~> (map sqr) (map sqr)))
             (list 1 2 3 4 5)))
           "(map) doforestation should only be done for right threading")
(check-exn exn:fail?
           (thunk
            ((☯ (~> (filter odd?) (filter odd?)))
             (list 1 2 3 4 5)))
           "(filter) doforestation should only be done for right threading")
(check-exn exn:fail?
           (thunk
            ((☯ (~>> (filter odd?) (~> (foldr + 0))))
             (list 1 2 3 4 5)))
           "(foldr) doforestation should only be done for right threading")

... and saw these tests now fail as expected, and then, finally, we added the fix to check for right chirality in the deforestation syntax classes. For instance:

(define-syntax-class fusable-list-operation
    #:attributes (f next)
    #:datum-literals (#%host-expression #%partial-application)
    (pattern (~and (#%partial-application
                    ((#%host-expression (~literal map))
                     (#%host-expression f)))
                   stx)
      #:do [(define chirality (syntax-property #'stx 'chirality))]
      #:when (and chirality (eq? chirality 'right))
      #:attr next #'map-cstream-next)
    (pattern (~and (#%partial-application
                    ((#%host-expression (~literal filter))
                     (#%host-expression f)))
                   stx)
      #:do [(define chirality (syntax-property #'stx 'chirality))]
      #:when (and chirality (eq? chirality 'right))
      #:attr next #'filter-cstream-next))

... and that brought us back to:

$ make test-flow
racket -y qi-test/tests/flow.rkt
465 success(es) 0 failure(s) 0 error(s) 465 test(s) run

... except that this time, we actually were expecting the tests to pass!

Best Practices for Compiler Testing?

Ben brought up recently that the unit tests for compiler rules seems overly specialized at first glance. It could be okay, but it's worth asking other projects that do a lot of compiler work how they do this kind of thing, and see if there are any best practices we could emulate. We considered Racket/Chez and LLVM, among other projects that we should reach out to.

Qi is an Accessible Project

There was a broad sentiment expressed that Qi makes it easy for interested parties to get involved and rapidly make deep and valuable contributions. If you're reading this, come hang out with us!

Next Steps

(Some of these are carried over from last time)

  • Improve the normalization pass to only apply threading identities when the chiralities of the components match.
  • Validate that optimization passes can prematurely terminate if they don't return false when they don't transform the input (and then fix it).
  • Contact other people working on compilers (e.g. LLVM, Racket/Chez) to get an idea of best practices on testing optimizations.
  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr map filter foldl zip average upto). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
  • Write unit tests for more compiler rewrite rules.
  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Add additional nonlocal benchmarks to validate all cases optimized by stream fusion.
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Jair, Sid

Clone this wiki locally