Type Inference
+Garter, aka Guaranteed Type Enforced Reliability, is an extensnion of FDL with statically inferred types.
+garter
starts with fer-de-lance
and makes one major addition and a minor deletion, in particular, we
-
+
- add static types, +
- replace unbounded tuples, with pairs. +
That is, we now have a proper type system and the Checker
is extended to infer types for all sub-expressions.
The code proceeds to compile
(i.e. Asm
generation) only if it type checks.
This lets us eliminate a whole bunch of the dynamic tests
+-
+
assertType TNumber
+assertType TBool
+assertType TTuple
+assertBound
+
etc. as code that typechecks is guaranteed to pass the checks at run time.
+Strategy
+Lets start with an informal overview of our strategy for type inference; as usual we will then formalize and implement this strategy.
+The core idea is this:
+-
+
- Traverse the
Expr
…
+ - Generating fresh variables for unknown types… +
- Unifying function input types with their arguments … +
- Substituting solutions for variables to infer types. +
Lets do the above procedure informally for a couple of examples!
+Example 1: Inputs and Outputs
+let incr = (lambda(x): add1(x)) in
+ incr(7)
Example 2: Polymorphism
+let id = (lambda(x): x)
+ , a1 = id(7)
+ , a2 = id(true)
+in
+ true
Example 3: Higher-Order Functions
+let f = (lambda(it, x): it(x) + 1)
+ , incr = lambda(z): add1(z)
+in
+ f(incr, 10)
Strategy Recap
+-
+
- Traverse the
Expr
…
+ - Fresh variables for unknown types… +
- Unifying function input types with their arguments … +
- Substituting solutions for variables to infer types … +
- Generalizing types into polymorphic functions … +
- Instantiating polymorphic type variables at each use-site. +
Plan
+-
+
- Types +
- Expressions +
- Variables & Substitution +
- Unification +
- Generalize & Instantiate +
- Inferring Types +
- Extensions +
Syntax
+First, lets see how the syntax of our garter
changes to enable static types.
Syntax of Types
+A Type
is one of:
data Type
+ = TInt -- Int
+ | TBool -- Bool
+ | [Type] :=> Type -- (t1,...,tn) => t
+ | TVar TVar -- a, b, c
+ | TPair Type Type -- (t0, t1)
+ | TCtor Ctor [Type] -- e.g. C[t1,...,tn]
+ deriving (Eq, Ord)
here Ctor
and TVar
are just string names:
newtype Ctor
+ = CT String -- e.g. "List", "Tree"
+ deriving (Eq, Ord)
+
+newtype TVar
+ = TV String -- e.g. a, b, c
+ deriving (Eq, Ord)
Finally, a polymorphic type is represented as:
+data Poly = Forall [TVar] Type -- e.g. forall a. (a, a) => Bool
Example: Monomorphic Types
+A function that
+-
+
- takes two input
Int
+ - returns an output
Int
+
Has the monomorphic type (Int, Int) => Int
Which we would represent as a Poly
value:
(Forall [] ([TInt, TInt] :=> TInt)) :: Poly
+Note: If a function is monomorphic (i.e. not polymorphic), we can just use the empty list of TVar
.
Example: Polymorphic Types
+Similarly, a function that
+-
+
- takes a value of any type and +
- returns a value of the same type +
Has the polymorphic type forall a. (a) => a
Which we would represent as a Poly
value:
Forall [TV "a"] ([TVar (TV "a")] :=> (TVar (TV "a")))
+Note: Haskell allows OverloadedStrings
which lets us treat "a"
as both a TVar
and a Type
as needed. So we can rewrite the above as:
Forall ["a"] (["a"] :=> "a")
+Similarly, a function that takes two values and returns the first, can be given a Poly
type forall a, b. (a, b) => a
which is represented as:
Forall ["a", "b"] (["a", "b"] :=> "a")
+Syntax of Expressions
+To enable inference garter
simplifies the language a little bit.
-
+
Dynamic tests
isNum
andisBool
are removed,
+Tuples always have exactly two elements; you can represent
(x, y, z)
as(x, (y, z))
.
+Tuple access is limited to the fields
Zero
andOne
(instead of arbitrary expressions).
+
data Expr a
+ = ...
+ | Tuple (Expr a) (Expr a) a -- Tuples have 2 elems
+ | GetItem (Expr a) Field a -- Access 0-th or 1-st elem
+ | Fun (Bind a) Sig [Bind a] (Expr a) a -- named functions
+
+data Field
+ = Zero -- 0-th field
+ | One -- 1-st field
+
+data Sig
+ = Infer -- no specification, infer scheme
+ | Check Poly -- check that function has given (poly-) type
+ | Assume Poly --
Plan
+-
+
- Types +
- Expressions +
- Variables & Substitution +
- Unification +
- Generalize & Instantiate +
- Inferring Types +
- Extensions +
Substitutions
+Our informal algorithm proceeds by
+-
+
- Generating fresh type variables for unknown types, +
- Traversing the
Expr
to unify the types of sub-expressions,
+ - By substituting a type variable with a whole type. +
Lets formalize substitutions, and use it to define unification.
+Representing Substitutions
+We represent substitutions as a record with two fields:
+data Subst = Su
+ { suMap :: M.Map TVar Type -- ^ hashmap from type-var := type
+ , suCnt :: !Int -- ^ counter used to generate fresh type vars
+ }
-
+
suMap
is a map from type variables to types,
+suCnt
is a counter used to generate new type variables.
+
For example, exSubst
is a substitution that maps a
, b
and c
to Int
, Bool
and (Int, Int) => Int
respectively.
exSubst :: Subst
+exSubst = Su m 0
+ where
+ m = (M.fromList [ ("a", TInt)
+ , ("b", TBool)
+ , ("c", [TInt, TInt] :=> TInt) ])
Applying Substitutions
+The main use of a substition is to apply it to a type, which has the effect of replacing each occurrence of a type variable with its substituted value (or leaving it untouched if it is not mentioned in the substitution.)
+We define an interface for “things that can be substituted” as:
+class Substitutable a where
+ apply :: Subst -> a -> a
and then we define how to apply
substitutions to Type
, Poly
, and lists and maps of Type
and Poly
.
For example,
+apply exSubst (["a", "z"] :=> "b")
returns
+[TInt, "z"] :=> TBool
by replacing "a"
and "b"
with TInt
and TBool
and leaving “z” alone.
QUIZ
+Recall that exSubst = ["a" := TInt, "b" := TBool]
…
What should be the result of
+apply exSubst (Forall "a" (["a"] :=> "a"))
+-
+
Forall ["a"] ([TInt] :=> TBool)
+Forall ["a"] (["a" ] :=> "a" )
+Forall ["a"] (["a"] :=> TBool)
+Forall ["a"] ([TInt] :=> "a")
+Forall [ ] ([TInt] :=> TBool)
+
Bound vs. Free Type Variables
+Indeed, the type
+forall a. (a) => a
+is identical to
+forall z. (z) => a
+-
+
A bound type variable is one that appears under a
forall
.
+A free type variable is one that is not bound.
+
We should only substitute free type variables.
+Applying Substitutions
+Thus, keeping the above in mind, we can define apply
as a recursive traversal:
apply _ TInt = TInt
+ apply _ TBool = TBool
+ apply su (TVar a) = M.findWithDefault (TVar a) a (suMap su)
+ apply su (ts :=> t) = apply su ts :=> apply su t
+ apply su (Forall as t) = Forall as $ apply (unSubst as su) t
where unSubst
removes the mappings for as
from su
Creating Substitutions
+We can start with an empty substitution that maps no variables
+empSubst :: Subst
+empSubst = Su M.empty 0
Extending Substitutions
+we can extend the substitution by assigning a variable a
to type t
extSubst :: Subst -> TVar -> Type -> Subst
+extSubst su a t = su { suMap = M.insert a t su' }
+ where
+ su' = apply (mkSubst [(a, t)]) (suMap su)
Telescoping
+Note that when we extend [b := a]
by assigning a
to Int
we must take care to also update b
to now map to Int
. That is why we:
-
+
apply [a := Int]
to update the old substitution to getsu'
+insert
the new assignmenta := Int
intosu'
.
+
Plan
+-
+
- Types +
- Expressions +
- Variables & Substitution +
- Unification +
- Generalize & Instantiate +
- Inferring Types +
- Extensions +
Unification
+Next, lets use Subst
to implement a procedure to unify
two types, i.e. to determine the conditions under which the two types are the same.
T1 | +T2 | +Unified | +Substitution | +
---|---|---|---|
Int |
+Int |
+Int |
+empSubst |
+
a |
+Int |
+Int |
+a := Int |
+
a |
+b |
+b |
+a := b |
+
a -> b |
+a -> d |
+a->d |
+b := d |
+
a -> Int |
+Bool -> b |
+Bool->Int |
+a := Bool, b:=Int |
+
Int |
+Bool |
+Error | +Error | +
Int |
+a -> b |
+Error | +Error | +
a |
+a -> Int |
+Error | +Error | +
-
+
- The first few cases: unification is possible, +
- The last few cases: unification fails, i.e. type error in source! +
Occurs Check
+-
+
The very last failure:
a
in the first type occurs inside free inside the second type!
+If we try substituting
a
witha -> Int
we will just keep spinning forever! Hence, this also throws a unification failure.
+
Exercise
+Can you think of a program that would trigger the occurs check failure? (You’ve probable written several such programs in 130 and 131!)
+Implementing Unification
+We implement unification as a function:
+unify :: Subst -> Type -> Type -> Subst
such that
+unify su t1 t2
returns a substitution su'
that extends su
with with the assignments needed to make t1
the same as t2
, i.e. such that apply su' t1 == apply su' t2
.
The code is pretty much the table above:
+unify :: SourceSpan -> Subst -> Type -> Type -> Subst
+unify _ su TInt TInt = su
+unify _ su TBool TBool = su
+unify sp su (TVar a) t = varAsgn sp su a t
+unify sp su t (TVar a) = varAsgn sp su a t
+unify sp su (is :=> o) (is' :=> o') = unify (unifys sp su is is') (apply s1 o) (apply s1 o')
+unify sp _ t1 t2 = abort (errUnify sp t1 t2)
The helpers
+-
+
unifys
recursively callsunify
on lists of types:
+varAsgn
extendssu
with[a := t]
if required and possible!
+
varAsgn :: SourceSpan -> Subst -> TVar -> Type -> Subst
+varAsgn sp su a t
+ | t == TVar a = su -- assigned to itself, do nothing
+ | a `elem` freeTvars t = abort (errOccurs sp a t) -- oops, fails occurs check!
+ | otherwise = extSubst su a t -- extend su with [a := t]
We can test out the above table:
+λ> :set -XOverloadedStrings
+
+λ> unify junkSpan empSubst TInt TInt
+(fromList [], 0)
+
+λ> unify junkSpan empSubst (["a"] :=> TInt) ([TBool] :=> "b")
+(fromList [(a,Bool), (b,Int)], 0)
+
+λ> unify junkSpan empSubst ("a") (["a"] :=> TInt)
+*** Exception: "Type error: occurs check fails: a occurs in (a) => Int"...
+
+λ> unify junkSpan empSubst TInt TBool
+*** Exception: "Type error: cannot unify Int and Bool" ...
Plan
+-
+
- Types +
- Expressions +
- Variables & Substitution +
- Unification +
- Generalize & Instantiate +
- Inferring Types +
- Extensions +
Generalize and Instantiate
+Ok, the next step is an easy one. Recall the example:
+let id = (lambda(x): x)
+ , a1 = id(7)
+ , a2 = id(true)
+in
+ true
For the expression lambda(x): x
we inferred the type (a0) => a0
We needed to generalize the above:
+-
+
- to assign
id
thePoly
-type:forall a0. (a0) => a0
+
We needed to instantiate the above Poly
-type at each use
-
+
- at
id(7)
the functionid
hasType
(Int) => Int
+ - at
id(true)
the functionid
hasType
(Bool) => Int
+
Lets see how to implement those two steps.
+Type Environments
+To generalize
a type, we
-
+
- Compute its (free) type variables, +
- Remove the ones that may still be constrained by other in-scope program variables. +
We represent the types of in scope program variables as type environment
+data TypeEnv = TypeEnv (M.Map Id Poly)
i.e. a Map
from program variables Id
to their (inferred) Poly
type.
Generalize
+We can now implement generalize
as:
generalize :: TypeEnv -> Type -> Poly
+generalize env t = Forall as t -- 4. slap a `Forall` on the unconstrained vars
+ where
+ tvs = freeTvars t -- 1. compute tyVars of `t`
+ evs = freeTvars env -- 2. compute tyVars of `env`
+ as = L.nub (tvs L.\\ evs) -- 3. compute unconstrained vars: (1) minus (2)
The helper freeTvars
computes the set of variables that appear inside a Type
, Poly
and TypeEnv
:
-- | type-vars of a Type
+ freeTvars TInt = []
+ freeTvars TBool = []
+ freeTvars (TVar a) = [a]
+ freeTvars (ts :=> t) = freeTvars ts ++ freeTvars t
+
+ -- | type-vars of a Poly (we remove the "bound" forall-vars)
+ freeTvars (Forall as t) = freeTvars t L.\\ as
+
+ -- | type-vars of an TypeEnv
+ freeTvars (TypeEnv m) = freeTvars [s | (x, s) <- M.toList m]
Instantiate
+Next, to instantiate a Poly
of the form
Forall [a1,...,an] t
+we:
+-
+
- Generate fresh type variables,
b1,...,bn
for each “parameter”a1...an
+ - Substitute
[a1 := b1,...,an := bn]
in the “body”t
.
+
For example, to instantiate
+ Forall ["a"] (["a"] :=> "a")
we
+-
+
- Generate a fresh variable e.g.
"a66"
,
+ - Substitute
["a" := "a66"]
in the body["a"] :=> "a"
+
to get
+["a66"] :=> "a66"
Implementing Instantiate
+We implement the above as:
+instantiate :: Subst -> Poly -> (Subst, Type)
+instantiate su (Forall as t) = (su', apply suInst t)
+ where
+ (su', as') = freshTVars su n
+ suInst = mkSubst (zip as as')
+ n = length as
Question Why does instantiate
return a Subst
?
Lets run it and see what happens!
+λ> :set -XOverloadedStrings
+
+λ> let tId = Forall ["a"] (["a"] :=> "a")
+
+λ> let (su0, id0) = instantiate empSubst tId
+
+λ> let (su1, id1) = instantiate su0 tId
+
+λ> id0
+(a0) => a0
+
+λ> id1
+(a1) => a1
As in anf
and label
-
+
- The
fresh
calls bump up the counter …
+ - But must use the returned
Subst
for future calls.
+
Plan
+-
+
- Types +
- Expressions +
- Variables & Substitution +
- Unification +
- Generalize & Instantiate +
- Inferring Types +
- Extensions +
Inference
+Finally, we have all the pieces to implement the actual type inference procedure ti
ti :: TypeEnv -> Subst -> Expr a -> (Subst, Type)
The function ti
takes as input:
-
+
- A
TypeEnv
mapping in-scope variables to their inferred (Poly
)-types,
+ - A
Subst
containing the current substitutions and fresh-variable-counter,
+ - An
Expr
whose type we want to infer.
+
and returns as output a pair of:
+-
+
- The inferred type for the given
Expr
and
+ - The extended
Subst
resulting from +-
+
- generating fresh type variables and +
- doing the unification needed to check the
Expr
.
+
+
Lets look at how ti
is implemented for the different cases of expressions.
Inference: Literals
+For numbers and booleans, we just return the respective type and the input Subst
without any modifications.
ti _ su (Number {}) = (su, TInt)
+ti _ su (Boolean {}) = (su, TBool)
Inference: Variables
+For identifiers, we
+-
+
- lookup their type in the
env
and
+ - instantiate type-variables to get different types at different uses. +
ti env su (Id x l) = instantiate su (lookupTypeEnv x env)
Why do we instantiate? Recall the id
example!
Inference: Let-bindings
+Next, lets look at let-bindings:
+ti env su (Let x e1 e2 _) = ti env'' su1 e2 -- (5)
+ where
+ (su1, t1) = ti env su e1 -- (1)
+ env' = apply su1 env -- (2)
+ s1 = generalize env' t1 -- (3)
+ env'' = extTypeEnv x s1 env' -- (4)
In essence,
+-
+
- Infer the type
t1
fore1
,
+ - Apply the substitutions from (1) to the
env
,
+ - Generalize
t1
to make it aPoly
types1
, +-
+
- why? recall the id example +
+ - Extend the env to map
x
tos1
and,
+ - Infer the type of
e2
in the extended environment.
+
Inference: Function Definitions
+Next, lets see how to infer the type of a function i.e. Lam
ti env su (Lam xs body) = (su3, apply su3 (tXs :=> tOut)) -- (5)
+ where
+ (su1, tXs :=> tOut) = freshFun su (length xs) -- (1)
+ env' = extTypesEnv env (zip xs tXs) -- (2)
+ (su2, tBody) = ti env' su1 body -- (3)
+ su3 = unify su2 tBody (apply su2 tOut) -- (4)
Inference works as follows:
+-
+
- Generate a function type with fresh variables for the unknown inputs (
tXs
) and output (tOut
),
+ - Extend the
env
so the parametersxs
have typestXs
,
+ - Infer the type of
body
under the extendedenv'
astBody
,
+ - Unify the expected output
tOut
with the actualtBody
+ - Apply the substitutions to infer the function’s type
tXs :=> tOut
.
+
Inference: Function Calls
+Finally, lets see how to infer the types of a call i.e. App
ti env su (App f es) = (su4, apply su4 tOut) -- (6)
+ where
+ (su1, tF) = ti env su f -- (1)
+ env' = apply su1 env -- (2)
+ (su2, tEs) = L.mapAccumL (ti env') su1 es -- (3)
+ (su3, tOut) = freshTVar su2 -- (4)
+ su4 = unify su3 tF (tEs :=> tOut) -- (5)
The code works as follows:
+-
+
- Infer the type of the function
f
astF
,
+ - Apply resulting substitutions to
env
(as before),
+ - Infer the types of the inputs
es
astEs
,
+ - Generate a variable for the unknown output type, +
- Unify the actual input-output
tEs :=> tOut
with the expectedtF
+ - Return the (substituted)
tOut
as the inferred type of the expression.
+
Note: In your starter code we factor out steps 3-6, i.e. the code that checks the arguments and infers the output type into tiApp
to let us reuse it to handle extensions
Plan
+-
+
- Types +
- Expressions +
- Variables & Substitution +
- Unification +
- Generalize & Instantiate +
- Inferring Types +
- Extensions +
Extensions
+The above gives you the basic idea, now you will have to implement a bunch of extensions.
+-
+
- Primitives +
- (Recursive) Functions +
- Type Checking +
Extensions: Primitives
+What about primitives ?
+-
+
add1(e)
,print(e)
,e1 + e2
etc.
+
What about branches ?
+-
+
if cond: e1 else: e2
+
What about tuples ?
+-
+
(e1, e2)
ande[0]
ande[1]
+
All of the above can be handled as applications to special functions.
+For example, you can handle add1(e)
by treating it as passing a parameter e
to a function with type:
(Int) => Int
+Similarly, handle e1 + e2
by treating it as passing the parameters [e1, e2]
to a function with type:
(Int, Int) => Int
+Can you figure out how to similarly account for branches, tuples, etc. by filling in suitable implementations of:
+-
+
prim2Poly
+ifPoly
+tupPoly
,fieldPoly
+
HINT: Use the implementation for prim1Poly
(and instApp
, tiApp
) in the starter code as a guide.
Extensions: (Recursive) Functions
+Extend or modify the code for handling Lam xs e
so that you can handle Fun f xs e
. This is mostly straightforward:
-
+
- You can basically reuse the code as is +
- Except if
f
appears in the body ofe
+
Can you figure out how to modify the environment under which e
is checked to handle the above case?
Extensions: Type Checking
+While inference is great, it is often useful to specify the types.
+While inference is great, it is often useful to specify the types.
+-
+
- They can describe behavior of untyped code +
- They can be nice documentation, e.g. when we want a function to have a more restrictive type. +
Assuming Specifications for Untyped Code
+For example, we can implement lists as tuples and tell the type system to:
+-
+
- trust the implementation of the core list library API, but +
- verify the uses of the list library. +
We do this by:
+def null() as forall a. () => List[a]:
+ false
+in
+
+def link(h, t) as forall a. (a, List[a]) => List[a]:
+ (h, t)
+in
+
+def tail(l) as forall a. (List[a]) => List[a]:
+ l[1]
+in
+
+def length(l):
+ if l == null():
+ 0
+ else:
+ 1 + length(tail(l))
+in
+
+let l0 = link(0, link(1, link(2, null())))
+in
+ length(l0)
+The as
keyword tells the system to trust the signature, i.e. to assume it is ok, and to not check the implementations of the function (see how ti
works for Assume
.)
However, the signatures are used to ensure that null
, link
and tail
are used properly, for example, if we tried the following
link(1, link(true, null()))
+we should get an error:
+tests/input/err-list.gtr:9:1-24: Type error: cannot unify Int and Bool
+
+ 9| link(1, link(true, null()))
+ ^^^^^^^^^^^^^^^^^^^^^^^
+Checking Specifications
+Finally, sometimes we may want to restrict a function be used to some more specific type than what would be inferred.
+garter
allows for specifications on functions using the ::
operator. For example, you may want a special function that just compares two Int
for equality:
def eqInt(x, y) :: (Int, Int) => Bool :
+ x == y
+in
+ eqInt(17, 19)
+As another example, you might write a swapList
function that swaps pairs of lists The same code would swap arbitrary pairs, but lets say you really want it work just for lists:
def null() as forall a. () => List[a]:
+ false
+in
+
+def link(h, t) as forall a. (a, List[a]) => List[a]:
+ (h, t)
+in
+
+def swapList(p) :: forall a.((List[a], List[b])) => (List[b], List[a]) :
+ (p[1], p[0])
+in
+
+let l0 = link(1, null())
+ , l1 = link(true, null())
+in
+ swapList((l0, l1))
+Can you figure out how to extend the ti
procedure to handle the case of Fun f (Check s) xs e
, and thus allow for checking type specifications?
HINT: You may want to factor out steps 2-5 in the ti
definition for Lam xs body
— i.e. the code that checks the body
has type tOut
when xs
have type tXs
— into a separate function to implement the ti
cases for
-
+
Fun f (Infer) xs e
+Fun f (Check) xs e
+
This is a bit tricky, and so am leaving it as Extra Extra Credit.
++