Skip to content

Overlord for Lispers

Paul M. Rodriguez edited this page Jul 12, 2022 · 1 revision

Prelude: the scandal of defvar

Take a moment to think about how strange defvar must seem coming from other programming languages.

To review: there are two ways to define a global variable (defparameter and defvar). The difference between them is how they behave when evaluated in the same session: defvar only re-evaluates its definition if the variable has not already been defined.

(All examples are in the overlord-user package, which uses Overlord as well as Alexandria and Serapeum).

(defparameter *table-1* (dict :x 1))
(gethash :x *table-1*) => 1
(defparameter *table-1* (dict :x 2))
;; The new value overrides the old value.
(gethash :x *table-1*) => 2

(defvar *table* (dict :x 1))
(gethash :x *table*) => 1
(defvar *table* (dict :x 2))
;; The old value is preserved.
(gethash :x *table*) => 1

Why do we need defvar? To hold state that we expect to change during the program’s run. And why do we need defvar to behave the way it does – not to re-evaluate its form? To avoid destroying the state we have built up in the course of development when we re-load our source files.

(If you think about it, defvar summarizes a lot of what is distinctive about development in Lisp.)

But there’s something familiar about the behavior of defvar. It’s a bit like a target in a build system. Specifically, it’s a bit like an order-only dependency in a Makefile.

In Make order-only dependencies are the canonical way to depend on the existence of a directory. If the directory does not exist, it is created; if the directory already exists, nothing happens.

You need a directory to exist before you can start writing to files in it. A variable defined using defvar, that holds a hash table, so you can write to the keys of that hash table later in your program, is morally the same thing.

State in the Lisp image is analogous to state in the file system, and we can tame it the same way we tame the file system: with a sufficiently expressive build system.

(Note that Overlord also builds files: the point is that state in the Lisp system and state in the file system have enough in common that the difference can be hidden behind a single protocol. The build system doesn't have to know or care if a target is a variable or a file.)

define-var-once

The most annoying thing about defvar is that if you change the definition, nothing happens – the variable is not updated. Now consider overlord:define-var-once.

(define-var-once *table* (dict :x 1))
(gethash :x *table*) => 1
(incf (gethash :x *table*))
(define-var-once *table* (dict :x 1))
(gethash :x *table*) => 2
(define-var-once *table* (dict :x 3))
(gethash :x *table*) => 3

A define-var-once form behaves the same as defvar – unless the definition is changed. If the definition changes, the variable is considered out of date.

For defvar, the variable is considered out of date under only one condition:

  1. the variable is unbound

But for define-var-once, the variable is considered out of date under either of two conditions:

  1. the variable is unbound
  2. the definition in the define-var-once form has changed

define-target-var

Would it be useful to be able to add other conditions? Consider a variable that reads in data from a file.

(defvar *file-lines*
  (let ((file (asdf:system-relative-pathname :my-project "my-file.txt")))
    (lines (read-file-into-string file))))

Using defvar saves effort: the file will only be read in once. But what if the file changes? Then we have to specifically ask Lisp to re-evaluate the definition.

This is a job for overlord:define-target-var.

(defparameter *my-file*
  (asdf:system-relative-pathname :my-project "my-file.txt"))

(define-target-var *file-lines*
    (lines (read-file-into-string *my-file*))
  (depends-on *my-file*))

Now we have a variable that will be re-evaluated under three conditions:

  1. if the variable is unbound
  2. if the definition changes
  3. if the file found at *my-file* changes

A target can be rebuilt by re-evaluating its definition, but it can also be rebuilt by name with build:

(overlord:build '*file-lines*)

defconfig

The new definition of *file-lines* is now much more precise. But there is an obvious flaw: what if you change the name of the file? Then *file-lines* will be out of date, because it will contain the data from the wrong file.

Here’s a better version:

(defconfig +my-file+
    (asdf:system-relative-pathname :my-project "my-file.txt"))

(define-target-var *file-lines*
    (lines (read-file-into-string +my-file+))
  (depends-on '+my-file+)
  (depends-on +my-file+))

Notice the new dependency. We now have a variable that will be re-evaluated under four conditions:

  1. if the variable is unbound
  2. if the definition changes
  3. if the file (initially) found at +my-file+ changes
  4. if the value of the variable +my-file+ changes

The quotation mark in the depends-on form creates a dependency on the variable itself, rather than on the value of the variable. Variables defined with Overlord can depend on on other variables.

(You should use a +cage+ instead of *earmuffs* when defining variables with defconfig, for reasons that are outside the scope of a tutorial.)

Dynamic dependencies

Up to this point we have pretended that depends-on is declarative.

Consider a situation slightly similar to the above. In your project you have a file that holds a list of other files. You want to read all those files into a single string.

(defconfig +my-file+
    (asdf:system-relative-pathname :my-project "my-file.txt"))

(define-target-var *file-lines*
    (lines (read-file-into-string +my-file+))
  (depends-on '+my-file+)
  (depends-on +my-file+))

(define-target-var *big-string*
    (with-output-to-string (s)
      (dolist (file *file-lines*)
        (write-string
         (read-file-into-string
          (file file)))))
  (depends-on '*file-lines*)
  ;;; !!!
  (dolist (file *file-lines*)
    (depends-on (uiop:parse-unix-namestring file))))

You can see that depends-on is actually just a function: not a keyword, not a macro; just a function.

(You can even call depends-on outside of the definition of a target. This records a dependency for the current package; yes, packages are targets too.)

Warning: the above example is not idiomatic. If you have a list of targets, instead of using dolist, you should just call depends-on; it automatically flattens its arguments.

(depends-on (mapcar #’uiop:parse-unix-namestring *file-lines*))

Where files come from

At this point we need to stop and address: how does Overlord find your files? This is the subject of a separate article. Short answer: if your package and system have different names, you may need to use overlord:set-package-system to associate them.

Building files

Now that we know how file names are resolved, we are ready to introduce file-target and defpattern.

Up to this point, we’ve been comparing Overlord to Make, but now that we are actually building files, it must be admitted that Overlord does not use the Make model; it is actually based on Redo. This is why it doesn't matter where you define your dependencies: unlike Make (or ASDF, for that matter) Overlord, being based on Redo, is naturally recursive.

Here is an example from the documentation of the Apenwarr implementation of Redo. It does a good job of showing how a Redo-style system differs from the Make model. In this case we are compiling a C file. As a side effect of the compilation, a list of header files is written to disk. We then read the file and depend on the header files it contains.

(defpattern c-object-file (:in in :out out) ()
  (depends-on in)
  (let ((in.d (path-join in (extension "d")))
        (in.c (path-join in (extension "c"))))
    (cmd "gcc -MD -MF" in.d "-c -o" out in.c)
    (let ((deps (read-file-into-string in.d)))
      (depends-on (mapcar #’uiop:parse-unix-namestring (lines deps))))))

(file-target my-prog (:path "myprog" :out out)
  (let ((deps (list #p"a.o" #p"b.o")))
    (depends-on
     (loop for dep in deps
           collect (pattern-from 'c-object-file dep)))
    (cmd "gcc -o" out deps)))

The :path argument to file-target is optional; if you omit it the path is derived by downcasing (actually, case-inverting) the name of the target.

Note that both patterns and files must have names. In the case of a pattern the name is bound as a class; in the case of a file it is bound as a global lexical (or a special variable, if the name has *earmuffs*).

Giving things names is the cost of doing everything inside Lisp. In one Lisp image there could be multiple projects using the same relative pathnames, or defining different ways to build files based on their extension. Names are necessary to differentiate them.

cmd is a DSL for shell commands. You should use it when possible, as it takes care of ensuring that the command is run in the right directory, even in a multi-threaded build.

defpattern defines a class. Equivalently, you could define the class using defclass and specialize some generic functions. But you should use defpattern if you can, since only patterns defined using defpattern will be considered out of date if the definition changes.

The syntax of file-target is complex, since it needs to address different scenarios:

  1. Is the input file name hard-coded?
  2. Should creating the file be atomic?

Creating files atomically is preferable – which is why there is syntactic support for it – but it is, unfortunately, sometimes impractical. In particular, surprisingly many command-line tools insist on generating an output file under a name derived from the input file and do not accept options to redirect to a different file or to stdout.

write-file-if-changed

The build script for a file target is not required to update the file it builds (unless the file does not exist). The utility write-file-if-changed takes a string, or an array of bytes, and writes them out to a file only if they are different from the file's existing contents.

Misc targets

Directories

You can depend on whether a directory exists using directory-exists target. As long as the directory exists, the target is always considered up to date. If the directory does not exist, the target is considered out of date. Building the target simply creates the directory.

Digests

The stamp used for a file is a tuple of its last modification time and its size (according to stat). Using the size alongside the timestamp gets us most of the practical benefit of using file hashes, but is far cheaper.

You can construct file digest prerequisites using file-digest.

Phony targets

Oracles

Oracles let you depend on specific pieces of the Lisp environment or the OS environment. They are short pieces of data that are essentially self-describing, like the value of *print-base* or PATH. The trick is to store the value (or a hash of the value) as its own stamp.

Oracles are prerequisites, but not targets: you can depend on them but they cannot be "built".

Lisp variable oracles

You can depend on the value of Lisp variables.

Lisp variable oracles are for depending on reader control variables, like *print-base* or *read-default-float-format*. If you want to depend on a variable you defined, you should use defconfig and depend directly on the variable.

Environment variable oracles

You can depend on the value of an OS environment variable, like CC or PATH.

System versions

You can depend on the declared version of an ASDF system.

Quicklisp dist versions

You can depend on the version of a particular Quicklisp dist.

Feature oracles

You can depend on whether a particular feature is present in *features*.

Function oracles

You can wrap any named function as an oracle.

Appendix: defconfig and define-target-config

Configurations are global Lisp bindings. They have some of the qualities of variables (they can be rebound) and some of the qualities of constants (they are evaluated at compile time, not load time).

Configurations are fundamental to Overlord. Most target-defining forms implicitly depend on a configuration that holds a copy of the definition. (This is how Overlord detects redefinitions.)

define-target-config

Configurations with dependencies are evaluated, like simple configurations, at compile time. But they also have dependencies, so if their dependencies are out of date at load time – or whenever the form is re-evaluated – they get rebuilt anyway.

Configurations with dependencies are mostly useful to move expensive computations from load time to compile time.