Skip to content

The Compiler

Sam Phillips edited this page Jan 8, 2024 · 6 revisions

This is a placeholder for a guide to understanding how the compiler is implemented, as well as possible avenues for possible optimization, workflows and tools for validating performance and checking compiled output at various stages, etc.

Layout

TODO.

Theory of Optimization

Functional, immutable. No accidental side-effects.

Optimizations

Currently, only stream fusion (deforestation) is implemented.

Stream Fusion

Stream fusion - sometimes called deforestation - is implemented for a very limited set of host expressions.

The implementation of the whole deforestation pass is in the deforest.rkt module.

Deforestation Pass Architecture

For certain sequences of flow steps that normally operate on lists, a stream approach for processing is implemented which generates optimized code without generating intermediate lists after each operation.

There are three types of operations that are eligible for stream fusion:

  1. Stream Producers mark the beginning of fused sequence and without deforestation they would produce a list. When fused, these generate the stream of values that would normally be contained in the list produced.
  2. Stream Transformers normally take a list of values and produce a new list of values. They can add, remove and modify the elements of the list. When fused, these changes are executed on a per-step basis and a standard semantics of how this is related to the input stream elements is defined.
  3. Stream Consumers take a list of values and produce a single value. When fused, these take care of terminating the fused sequence and produce a final value of the optimized part of the flow.

The following producers are implemented:

  • implicit producer from list value,
  • range generator producer.

The following transformers are implemented:

  • filter - can be used with implicit producer,
  • map - cannot be used with implicit producer currently.

The following consumers are implemented:

  • implicit consumer that collects the result in list,
  • foldr,
  • foldl,
  • car.

Low-level CPS Implementation

The core of this stream fusion optimizer is continuation-passing style stream implementation. Stream is represented as lambda term - usually called "next" - with three arguments:

  • done - lambda accepting no arguments which gets evaluated when no more elements are generated by this stream,
  • skip - lambda accepting one argument - the new state - which is evaluated when the current input element is not to be used,
  • yield - lambda accepting two arguments - current value and the new state - which is evaluated when the current element is to be processed.

Producers must accepts these three mandatory arguments and produce a lambda accepting the initial state.

Transformers must at least accept the "next" lambda and produce a lambda term with the stream signature, implementing given transformation. The transformers usually accept more arguments specifying the transformation parameters.

Consumers must at least accept the "next" lambda and run the whole fused pipeline producing a result value. The consumer usually accepts more arguments specifying the operation parameters - like transformers do.

Internal Deforestation Syntax Classes

Fusable operations within the flow are matched using syntax classes which also provide attributes required by the fused operation generator.

The fusable-stream-producer syntax class matches both implicit and explicit producers and it provides the following attributes for generating the fused operation:

  • next - low-level stream procedure (see above)
  • prepare - wrapper procedure that converts input and applies it appropriately to the next procedure
  • contract - procedure contract which is used for wrapping the whole fused operation and ensure runtime checks of its input values
  • name - used for error reporting
  • curry - currying procedure for the prepare part - it must handle direct usage and fine and blanket templates

The following operations are supported:

  • list->cstream
  • range

The fusable-stream-transformer0 matches any transformer that can be be used with implicit producer. It provides the following attributes for the fused operation generator:

  • f - the operation of the transformer (right now this class is tailored towards filter- and map-like transformers, it might require extensions in the future)
  • next - low-level stream procedure

Currently only filter is implemented.

The fusable-stream-transformer matches any transformer and provides the same attributes as fusable-stream-transformer0. The following operations are supported:

  • filter
  • map

The fusable-stream-consumer matches any consumer. It provides the following attribute for the fused operation generator:

  • end - the beginning of the S-expression surrounding the generated fused operation

The following operations are supported:

  • cstream->list
  • foldr
  • foldl
  • car

Deforestation Pass Syntax Rewrite

The actual deforestation pass is implemented as syntax transformation of the source flow based on aforementioned syntax classes. The rewrite looks for the following patterns spliced within the flows:

  • producer transformer ... consumer
  • transformer0 transformer ... consumer
  • producer transformer ...+
  • transformer0 transformer ...+

For each of these patterns, there is a template that converts the sequence into a directly fusable (normalized) form as follows:

  • producer transformer ... consumer

This syntax rewrite ensures the first element is (maybe an implicit) producer and the last element is (maybe an implicit) consumer.

Generating Fused Operation

The deforestation pass syntax rewrite uses the generate-fused-operation to convert a normalized fusable sequence into a Racket expression the implements this sequence as a sequence of fused stream operations.

It also attaches the runtime contracts to both producer and consumer if applicable.

Implementing a Fusable Operation

To implement deforestation of a new operation, the low-level CPS implementation must be created first and then appropriate syntax class must be extended with a pattern mattching the operation in the source flow. This pattern must also specify all the required attributes.

Workflows and Tools

  • Using environment variables like PLT_LINKLET_SHOW_CP0=1.
  • Inspecting compiler output at intermediate stages.
  • Inspecting expansions using raco expand.
  • Gotchas: stale compilation. Use racket -y.

Verification

Different strategies for testing the compiler are appropriate depending on what you're trying to do.

  • Are you trying to verify the meaning of a compiled expression? Write a regular unit test in qi-test/tests/flow.rkt.
  • Are you trying to verify that an optimization was applied? Do you already know what the optimized Qi version should look like? Write an "equivalence" test in qi-test/tests/compiler.rkt (see the normalization tests for example), verifying that independently compiling the original expression and the optimized expression produce the same result.
  • Otherwise, if you don't already know the optimized expression, then write a granular test like the ones for the deforestation pass.

In addition, also consider writing a benchmark to validate the performance improvement, in qi-sdk/profile/nonlocal.

Ideally, it would be good to develop a body of proofs of the correctness of every compiler transformation (see Validly Verifying that We're Compiling Correctly).

Also see Best Practices for Testing Compiler Optimizations?.

Research Directions

  • Alternative layouts - E-graphs, etc.
  • Applying stream fusion to core routing forms (may involve new forms of fusion that aren't exactly the traditional kind -- especially to handle multiple values, but also potentially for combinators like -<).
  • Avoiding values->list->values at the level of interface macros, and in host expressions that are subexpressions of flows.
Clone this wiki locally