Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ellipses #37

Merged
merged 25 commits into from
Oct 11, 2024
Merged

ellipses #37

merged 25 commits into from
Oct 11, 2024

Conversation

quasarbright
Copy link
Collaborator

@quasarbright quasarbright commented Sep 18, 2024

This PR introduces an ellipsis syntax in binding specs, and makes them mandatory. Spec variable references in binding specs must occur at the same ellipsis depth as their syntax spec counterparts.

Motivation

Ellipses help control the binding structure of syntax with sequences. For example:

#lang racket

(require syntax-spec-v2)

(syntax-spec
  (binding-class my-var)
  (nonterminal my-expr
    x:my-var
    n:number
    (my-let ([x:my-var e:my-expr] ...) body:my-expr ...)
    #:binding (scope (bind x) body)
    (my-cond [test:my-expr #:as x:my-var body:my-expr] ...)
    #:binding (scope (bind x) body))
  (host-interface/expression (my-dsl e:my-expr)
                             #`'#,((nonterminal-expander my-expr) #'e)))

; dsl-expansion should error saying x is unbound in second branch
(my-dsl (my-cond [1 #:as x 1]
                 [2 #:as y x]))

In the above example, we're defining a version of cond that binds the result of the test in the branch's body. In the face of sequences of syntax, syntax-spec infers the binding structure automatically. However, it cannot distinguish between this cond-like binding structure and the let-like binding structure. In both cases, there is a list of identifiers being bound in a list of bodies. But should each identifier be bound only in the corresponding body? Or all bodies? The inferred structure binds each variable in all bodies, which is wrong in this case.

To allow users more control over binding structure, we should add ellipses to binding specs.

Here is the example re-written with ellipses:

#lang racket

(require syntax-spec)

(syntax-spec
  (binding-class my-var)
  (nonterminal my-expr
    x:my-var
    n:number
    (my-let ([x:my-var e:my-expr] ...) body:my-expr ...)
    #:binding (scope (bind x) ... body ...)
    (my-cond [test:my-expr #:as x:my-var body:my-expr] ...)
    #:binding [(scope (bind x) body) ...])
  (host-interface/expression (my-dsl e:my-expr)
                             #`'#,((nonterminal-expander my-expr) #'e)))

(my-dsl (my-cond [1 #:as x 1]
                 [2 #:as y x]))

The placement of ellipses in binding specs distinguishes between the two cases.

I have verified that this properly treats the reference to x in the preceding example as unbound in the ellipsis implementation.

Interaction with nesting

Ellipses also change the syntax for nest:

(syntax-spec
  (binding-class my-var)
  (nonterminal my-expr
    n:number
    x:my-var
    (my-let* (b:binding-pair ...) body:my-expr)
    #:binding (nest b ... body))
  (nonterminal/nesting binding-pair (nested)
    [x:my-var e:my-expr]
    #:binding (scope (bind x) nested)))

Note that nest has internal ellipses for the binding pair sequence. This declares that the sequence is to be folded over.

nest-one no longer exists. Just use nest with no ellipses on the first argument. So (nest-one b e) is now written as (nest b e).

Interaction with -syntaxes bindings forms

The export-syntaxes and bind-syntaxes forms create simultaneous syntax bindings for a number of identifiers, using the multiple values returned from a single compile-time expression, similar to Racket's define-syntaxes.

These forms now have internal ellipses for the sequence of identifiers:

(syntax-spec
  (nonterminal/exporting block-form
    #:allow-extension racket-macro

    ((~literal define-values) (x:racket-var ...) e:racket-expr)
    #:binding [(export x) ...]

    ((~literal define-syntaxes) (x:racket-macro ...) e:expr)
    #:binding (export-syntaxes x ... e)

   ((~literal begin) d:block-form ...+)
   #:binding [(re-export d) ...]

    e:racket-expr))

Drawbacks

Syntaxes like my-cond where the current implicit behavior is incorrect have been rare in practice.

Requiring ellipses everywhere makes many binding specs somewhat more verbose. For example:

(let* (b:binding-group ...) body:action ...)
#:binding (nest b ... [body ...]))

vs the previous

(let* (b:binding-group ...) body:action ...)
#:binding (nest b body))

Or,

(letrec-syntaxes ([(v:racket-macro ...) e:expr] ...) b:racket-like-expr)
#:binding (scope (bind-syntaxes v ... e) ... b)

vs

(letrec-syntaxes ([(v:racket-macro ...) e:expr] ...) b:racket-like-expr)
#:binding (scope (bind-syntaxes v e) b)

An alternative: optional ellipses

Should ellipses be mandatory? This PR includes a static check that ensures all variables are properly ellipsized. However, it would also be possible to make ellipses optional and default to the current inferred behavior when they are not present. This would ensure all binding structures are possible to specify without requiring ellipses all the time. We could still statically check that there are not too many ellipses, since that would cause problems, but with too few ellipses, the old behavior of inferred binding structure would be used.

One subtle issue with this more implicit design is nest. In the implicit design, it seems most consistent with the behavior of the other binding forms that we be able to write what are now nest and nest-one in exactly the same way, relying on the ellipsis depth of the pattern variable binding to choose between the meanings. So a let* with a sequence of binding pairs:

(my-let* (b:binding-pair ...) body:my-expr)
#:binding (nest b body)

would have the same binding rule as a form with a single such binding pair:

(my-let*1 b:binding-pair body:my-expr)
#:binding (nest b body)

In the first case, nest is folding over a sequence, whereas in the second case it is nesting a single element. This implicit difference might be confusing.

@quasarbright quasarbright mentioned this pull request Sep 18, 2024
@mfelleisen
Copy link

No question. Make ellipses mandatory.

Adding contracts to exports makes the spec more verbose than an identifier but clarifies things at the same time.

Adding else to cond makes it more verbose than #t but is a better signal that noting should follow.

You name it, these small verbosities (seems to be an okay word) are good signals to the future reader.

(I have no code base that depends on syntax-spec so if someone has a 100,000 program in syntax-spec and needs the old behavior, speak up now.)

(struct rec with-stx [depth pvar] #:transparent)
; no surface syntax, just a mechanism to combine recs
; something like [(import x) (import y)] ~> (recs (list (rec x) (rec y)))
(struct recs [specs] #:transparent)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field of recs could simply contain pvars; they'll get there from the pass that combines together import forms into a single recs form, which will happen after things like depth checking.

@michaelballantyne
Copy link
Owner

Great! Seems like everybody is happy with explicit ellipses. @quasarbright I think it then definitely makes sense to try and do #33 and get rid of for/pv-state-tree from the runtime.

@quasarbright
Copy link
Collaborator Author

Great! Seems like everybody is happy with explicit ellipses. @quasarbright I think it then definitely makes sense to try and do #33 and get rid of for/pv-state-tree from the runtime.

I agree. For now, to get ellipses merged, i'll address your review comments first and then I can do ellipsized implicits after since it should be a non-breaking behavioral change

Comment on lines 37 to 38
(define bspec-distributed-ellipses (bspec-flatten-groups (bspec-distribute-ellipses bspec-flattened)))
(define bspec-absorbed-ellipses (bspec-absorb-ellipses-into-imports bspec-distributed-ellipses))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this and I'm not sure if there is a case that can break this. But I couldn't come up with a better idea

(host-interface/expression
(many-local ([d:defn ...] ...) body:racket-expr)
; this group is unnecessary, but we want to test the behavior of ellipsized groups with imports
#:binding (scope [[[[(import d)]] ...] ...] body)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tests that weird ellipsized grouped import case

@michaelballantyne
Copy link
Owner

michaelballantyne commented Sep 27, 2024

Discussing today, we realized there are several further design issues with ellipses and the operational order of expansion.

Ellipsized groups mixing binding and reference

Consider this production for a letrec form that is possible to write with ellipses:

(letrec ([x:var e:expr] ...) b:expr)
#:binding (scope [(bind x) e] ... b)

The design intent in syntax-spec is that the operational order of expansion matches the left-to-right reading of the binding rule. Extending that to ellipses, we would want the order to follow the reading of the binding rule expanded out with concrete instances. For example, for this letrec instance:

(letrec ([f (lambda () (g))]
         [g (lambda () (f))])
  (g))

we would read order of bindings and references as:

(scope (bind f) g (bind g) f)

Notice that we refer to g before we bind it! But in our denotational understanding of this binding rule as representing a scope graph, g is bound in the scope so the reference should be valid. The operational interpretation is not properly implementing the intended denotation.

Previously our solution has been a static restriction on the order of forms in a scope. All binds must come before all subexpressions and references. We need to extend this approach to catch this error in specs with ....

We plan to disallow mixing of different categories of operators within an ellipsized group. The following will be mutually exclusive within such a group:

  • export, export-syntax, export-syntaxes
  • bind, bind-syntax, bind-syntaxes
  • import
  • references and subexpressions (bare pattern variable references) and nested scopes

Then, the order checking rules can operate on these homogenous ellipsized groups rather than single operators.

Order of import operations

Even within imports alone, the order can matter. In a Racket definition context, the resolution of syntactic form names or macros in the first pass of expansion can depend on bindings established earlier in that first pass. For example, this expands:

(define-syntax-rule (my-define x e) (define x e))
(my-define y 5)

If we allow multiple imports in a single scope, the order of first-pass expansion can determine whether examples like this work.

Consider this strange form with two groups of definitions:

(my-block [defs1 ...] [defs2 ...])
#:binding (scope [(import defs1) (import defs2)] ...)

This binding rule suggests that the first definition in defs1 will expand, then the first in defs2, then the second in defs1, and so on. So this should expand successfully:

(my-block
 [(define x 1)
  (my-define y 2)]
 [(define-syntax-rule
    (my-define x e)
    (define x e))
  (define z 3)])

On the other hand, if we wrote the spec this way it would error:

(my-block [defs1 ...] [defs2 ...])
#:binding (scope (import defs1) ... (import defs2) ...)

With this latter spec all the defs1 would pass1-expand before any of the defs2.

We think that since we do currently allow multiple imports in a scope, we should be careful to match the apparent operational order in this way.

We can realize this operational order in the compilation of imports:

#:binding (scope (import a) ... (import b) ...)
;; compiles as
(group (group (ellipsis (subexp a-pass1)) (ellipsis (subexp b-pass1)))
       (group (ellipsis (subexp a-pass2)) (ellipsis (subexp b-pass2))))
   
#:binding (scope [(import a) (import b)] ...)
;; compiles as
(group (ellipsis (group (subexp a-pass1) (subexp b-pass1)))
       (ellipsis (group (subexp a-pass2) (subexp b-pass2))))

Alternative options:

  • Allow only one import in a scope.
  • Forbid ellipsized groups from containing more than one import. Only the second version of the my-block binding rule would be expressible.

@mfelleisen
Copy link

My personal preference would be to start with a restrictive version and check how often it fails with existing code. Then relax in the simplest way possible until we reach equilibrium or we decide that what people have already written, we don't want to support.

@michaelballantyne
Copy link
Owner

When a binding spec contains an import in an invalid position (such as outside a scope) that is in a group, the system breaks instead of producing a nice error:

image

Making groups a with-syntax structure might be involved in the solution.

@michaelballantyne
Copy link
Owner

michaelballantyne commented Oct 2, 2024

image

Putting a group w/o ellipsis here shouldn't be a problem but is.

Mike's thought: in order check, treat imports the same way as groups.

@michaelballantyne
Copy link
Owner

michaelballantyne commented Oct 2, 2024

Can we add this as a test? It checks that the different options for combining ellipses with multiple imports produce the expected order of expansion in the first pass.

#lang racket

(require syntax-spec
         (for-syntax (only-in syntax/parse expr)))

(syntax-spec
  (nonterminal/exporting def
    #:allow-extension racket-macro
    (mylet (d:def ...) (d2:def ...))
    #:binding (scope (import d) ... (import d2) ...)
    (mylet2 (d:def ...) (d2:def ...))
    #:binding (scope [(import d) (import d2)] ...)

    (mydef x:racket-var e:racket-expr)
    #:binding (export x)

    (mydefsyntax x:racket-macro e:expr)
    #:binding (export-syntax x e)

    (myexpr e:racket-expr))
  (host-interface/expression
    (mylang d:def)
    #:binding (scope (import d))
    #''d))

(mylang
 (mylet
  [(myexpr 1)
   (mydefsyntax mydef2 (syntax-rules () [(_ x e) (mydef x e)]))]
  [(mydef2 x 5)
   (myexpr 2)]))

(mylang
 (mylet2
  [(myexpr 2)
   (mydef2 x 5)]
  [(mydefsyntax mydef2 (syntax-rules () [(_ x e) (mydef x e)]))
   (myexpr 1)]))

Could also add error-case tests where the macro use is in the wrong order wrt the def and isn't bound.

@quasarbright
Copy link
Collaborator Author

for homogeneity error, highlight the group instead of the ellipses, and say "cannot mix imports or exports with other kinds of binding specs

@michaelballantyne
Copy link
Owner

Seems like ...+ is broken:

image

@quasarbright
Copy link
Collaborator Author

to fix ...+ not working, match on datum instead of literal for it in compile/syntax-spec.rkt

@michaelballantyne michaelballantyne merged commit 1319303 into michaelballantyne:main Oct 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants