Skip to content

Latest commit

 

History

History
421 lines (389 loc) · 14 KB

types.md

File metadata and controls

421 lines (389 loc) · 14 KB

type declarations and type elaboration

Enso offers one keyword for defining new types:

type Foo x y z

This creates a type Foo with three type arguments. For example:

main = 42:(Foo Float Integer Text)

fails with Type error: expected expression to be Foo, but got Integer. as 42 isn't value of type Foo.

One can add any number of atom constructors to such a type. For example following code snippet

type Foo x y z
    Zero
    One u:x
    Two u:x v:y
    Three u:x v:y w:z

adds four atom constructors named Foo.Zero, Foo.One, etc. with fields named x, u, w initialized to provided arguments when these constructors are used. The type Foo automatically offers getters for fields u, v and w. However some of these fields may yield undefined error value when other than Foo.There constructor (which assignes all the field values) is used. With the updated Foo type definition following two versions of main function type check fine:

main = Foo.Zero:(Foo Float Integer Text)       # yields Zero
main = (Foo.One 3.14):(Foo Float Integer Text) # yields (One 3.14)

Arguments to atom constructors can have defaults and explicit type signatures.

Warning

Following seems to be off a bit:

Due to two things that feel like 'primes' popping up in the conversation below, we've chosen to denote one with 'x.type' and the other with 'Code x'.

However, there are some limitations on type signatures that will be discussed later. There is a distinction between type and its atoms. Types themselves are effectively represented as atoms and type constructors are atom constructors as well.

List plays multiple roles here. It acts as a higher-kinded type constructor which can be fed type arguments to build a fully saturated type:

Int : Type
List : Type -> Type
List Int : Type

Given a type declaration, we desugar it into atoms and atom constructors and define subtyping relationships in a two-step process.

First, we gather all of the type variables used in its definition and build a new type with those type variables expressed in sequence. Then if certain conditions are met (e.g. the two have the exact same shape), we allow punning between the basic type constructor and this elaborated type constructor.

e.g.

type Book author title year

is equivalent to defining

type Book.type author_type title_type year_type
  type Book (author : author_type) (title : title_type) (year : year_type)

whereupon we notice that Book.type and Book are the same shape, and allow punning between Book.type and Book.

Book "Jane Austen" "Pride and Prejudice" 1832 : Book.type Text Text Int
Book "Jane Austen" "Pride and Prejudice" 1832 : Book Text Text Int

The eagle-eyed among you might note that author_type and author are different field names, so this would actually elaborate out to

type Book.type author title year
  type Book (author : author) (title : title) (year : year)

where we are careful with lexical scopes to distinguish between the term level author and the type level author, enabling this pun.

The List example

type List a
  Nil
  Cons (head : a) (tail : List a)

main =
    List.Cons 3
        List.Cons 14
            List.Nil

constructs a two item list and yields (Cons 3 (Cons 14 Nil)).

Warning

This is not at all what we currently do:

this elaborates on each of these type declarations and finds a couple of puns:

type List.type a = List a
type Nil.type = Nil

and one definition that emerges as not a pun:

type Cons.type a
  type Cons (head : a) (tail : List a)

static methods and type formation

List acts as a container holding its static methods.

Warning

Type constructors aren't really implemented in Enso as of now:

List serves double duty in the above. It acts as the type constructor for lists, so that List Int is a well formed Type, but it also ...

List Int is currently erased to List and the additional type element information of Int is lost during execution.

Warning

Enso supports just static methods

There are two forms of static methods that we are interested in in the long term. We'll classify them as 'normal' static methods and 'explicit' static methods. For now, we're going to support 'normal' static methods as they are needed for 'from' conversions. Still, there is an opportunity to allow more explicit statics later on for some uses involving meta-programming.

type Vector a
  MkVec (unsafe_to_array: Array a)

  from (that : List x): Vector x = MkVec (Array.from that)

Warning

This is not how it works:

this internally elaborates to

type Vector a
  type MkVec.type a
    type MkVec (unsafe_to_array: Array a)
  ...

with the obvious subtyping relationship

MkVec.type a <: Vector a

Similarly to the pun story above, when you have a single type statement contained in a parent type statement, and the shape aligns, we should be willing to allow punning between them, eliminating the distinct MkVec.type here. This results in a nice "obvious" usage pattern, where we aren't distinguishing between several vector types. The type of the module lives in the type constructor, and the type of its individual constructor that is limited to only holding arrays.

Static methods live in the type. Using Vector itself as the place where the statics live allows for a convenient usage pattern:

Vector.from_polyglot_array [1,2,3] : Vector Int

where 'Vector' is playing double duty as a type constructor (currently erased just to raw type Vector) and as the bag of static methods.

Warning

Another example with .type that's unlikely valid:

type Book.type
  type Book (author: String) (title: String) (year: Int)
  static whatever = ...

where 'whatever' is a method on Book.type, not Book, which is not a pun. Like java, static methods don't appear as static methods inside subtypes, and, in our case, terms that inhabit our times. This is a deliberate design choice, by choosing not to have static methods push down into atom constructors that build terms in a type for now we retain the option to revisit this later. There's a tension here between the 'ease of use' of getting list methods off of a proxy list you happen to have in hand as a 'prototype', and the ability to allow subtypes to have their own statics later on, which would be useful if we want to meaningfully subtype non-empty lists and lists, but allow different conversions 'to' non-empty lists. We don't have to choose now, so we prefer to leave the option open, until more sophisticated usage scenarios and users force a choice.

case objects

If we carefully read the above rules, these rules allow us to construct module-like objects via types:

type Bucket
  static wat : Foo -> Bar

which at first seems to not give me any way to construct a term of type Bucket, but recall it elaborates to

type Bucket.type
  type Bucket
  static wat : Foo -> Bar

and then puns Bucket.type = Bucket so we can talk about Bucket.wat exactly as if Bucket was a qualified module import. This supplies us with an implementation semantics for what 'modules' mean. They are just types with no parameters, where all their top level methods are lifted into static, and where we cut the subtyping relationship with any types you define inside the module and the module itself.

Future growth directions

Explicit static methods

The static methods get attached to the type definition. This means they do not have access to any of the type's atom arguments themselves.

Warning

No plans for explicit static methods

A syntax for this would something like

type Vector a
  type MkVec (unsafe_to_array : Array a)
  explicit static from (that : Json) = map (x -> x.to this.a) Json.expectArray

Notice how this "explicit" static method is accessing the type arguments of 'Vector' through this.

Note: This isn't something we should do in general as it requires explicitly passing type arguments everywhere, and would encumber from conversions unduly if it was the only implementation technique available to us. This would be analogous to building a type system without any "erasure" if we were forced to pass explicit types everywhere, and would come at an unacceptable performance tax.

Returning to the view of case objects and modules above, ML-style functors would correspond to having 'explicit' statics rather than static methods in the case object, which carries with it the concerns of whether modules have applicative or generative semantics. By punting on this for now we don't have to decide such things, and don't have to pay a universal performance tax, while still leaving the door to that becoming a feature we build in the future into the convenient semantic 'dead space' we leave undefined.

Future Direction: Normalization-by-evaluation for partial evaluation, behavioral reflection, and an improved editor experience

Pedantically, normalization-by-evaluation is an implementation technique commonly used in dependent type checkers or evaluators where you have a starting surface syntax (terms) that you compile down into internal semantic domain objects (values), but where you want the ability to reify all values back into terms that are in some sense canonical, (e.g. have been reduced to a so-called "normal" form).

This is made complicated by the fact that normal forms walk underneath functions, so you need to be able to evaluate functions with arguments they 'don't know yet' to reach this normal form. We call such arguments we don't know 'stuck' or 'neutral' terms. Any attempt to eliminate them (extract information, apply them, case on them, etc.) builds a bigger neutral term that captures what you attempted to do to it, and then we convert between these stuck terms back into full expressions in the surface language.

The magic is in slightly extending the rules of the interpreter we are using to allow these stuck terms to be passed, and to build up these larger stuck terms when elimination forms are applied to them.

Scenarios

IDE Usage

One such usage scenario is when dealing with how to show the user the 'result' of a computation that has not yet been wired up to all of its inputs. Consider a world where we have a stuck or out of scope 'a', 'b' and 'c'. In this setting when we go to evaluate

foldr (+) 0 [a,b,c]

we get the stuck term

a + (b + (c + 0))

and can show this to users even in the absence of a known a b and c, and then as they wire those up to concrete inputs, it'll gradually pop in complexity level by level until they have it fully wired up and they can see the final answer.

LINQ

Another direct analogue can be found in LINQ in the dot-net ecosystem. There Erik Meijer and Anders Hejlsberg allowed Visual Basic and C# lambdas to be inspected to see their contents, so that LINQ providers could transcode them into things like SQL FROM or HAVING clauses for SELECT statements where possible and the like.

This opens the door towards, say, an efficient library for 'Apache spark' calculations in Enso, where we can marshal lambdas out closer to the data, rather than have to bring the data to the lambda.

"Template Enso"

Morally this looks a lot like BER MetaOcaml, where we're deliberately being "evil" and allowing you to use naked terms in our language through NBE (like LINQ does above) instead of using an explicit syntax or typed template haskell 'Code' type. This places a stronger burden on users of the advanced reflection behaviors to get things right, but allows them to express the full array of LINQ-like options in the language.

The LINQ scenarios are predicated on being able to feed stuck values into programs and reify the result as a syntax tree for the partially evaluated result. The meta-programming or "template Enso" scenario is predicated on being able to reflect that syntax tree back into an Enso term or function.

Open design questions

This isn't a corner of the design space we need to fully realize at this point, and there is more room for expansion. There are some open questions we'd like to answer.

How should we represent the creation and feeding of stuck terms?

One could envision supplying a syntax for

foo :: A -> B neutral "x" foo

which feeds foo a stuck term of type A, and lifts the resulting term of type B to Code B, where it is explicitly known you can manipulate it as a syntax tree of the partially evaluated result. But this becomes a bit awkward to invoke.

More pragmatically, this is probably built ontop of a richer programming interface like

type Code a
  # has a side-effect of producing a fresh variable
  # morally this lives in a variable supply monad like TH's 'Q'
  static fresh : String -> Name

  type Var (name: Name)
  static map : (a -> b) -> Code a -> Code b

  # other constructors for Code
  type ...

  lower : a

or

type Code a
  static fresh : String -> Name
  type Var (name: Name)
  static pure : a -> Code a -- this is where the magic happens
  static apply : Code (a -> b) -> Code a -> Code b
  static map (f : a -> b) (self : Code a): Code b = apply (pure f) self
  static join (x : Code (Code a)) : Code a = x.lower
  # other constructors for Code
  type ...

  lower : a

where we can start to observe that the behavior of this 'linq' like Code construct looks like a monad/comonad in Enso due to all the things NBE let us identify here.

There remain other design options to explore here. Do we limit ourselves to terms like BER MetaOcaml? Do we allow elaboration to statements and other complex expressions like template-haskell? None of that needs to be resolved today.