Skip to content

Latest commit

 

History

History
769 lines (657 loc) · 36.9 KB

README.md

File metadata and controls

769 lines (657 loc) · 36.9 KB

aa

Cliff Click Language Hacking

To-the-metal performance: every abstraction used can be peeled away, yielding code that you would expect from a tightly written low-level program. This is a primary driver for me, the guy who's been writing compilers for high performance computing (including Java) all my life. Ease-of-programming is a close second, but only after you've declared that performance is not your main goal. Hence I support syntactic sugar on expensive operations (e.g. accidental auto-boxing in a hot loop can cost you 10x in speed) but only if the type-system cannot remove it and you've not declared that you are programming for speed.

Modern: Statically typed. Functional programming with full-strength type inference - types are everywhere optional. Minimal syntax. REPL (i.e. incremental static typing) and separate compilation. Typed C-style macros: most syntax errors in dead code are typed.

My intent is a modern language that can be used where C or C++ is used: low-level high-performance work and a systems implementation language (e.g. OS's, device drivers, GC's), but bring in all the goodness of the last 20 years of language improvements.

The language is intended to "look like" C or Java, but with all types optional and all keywords removed. The whole no-keywords thing is an experiment I may back away from; a primary goal is to be readable - sometimes keywords feel like clutter and sometimes they are an code-anchor for your scanning eye.

I specifically intend to look at real-time constraints, and the notion of Time in a language.

Part of modern coding is the use of Garbage Collection, and the structured use of malloc/free - so I intend to add Rust-style memory management.

Part of modern coding is the use of multiple cores, so I intend to explore a couple of different concurrency models. A notion of a true 'final' field, made with a final-store (as opposed to declared as a final type) - you can make cyclic final structures, and once final all future reads will see the final value. No odd exceptions for "early publishing" a pointer.

And of course, this is a Work-In-Progress!!!!

GRAMMAR

BNF Comment
prog = stmts END
stmts= [tstmt or stmt][; stmts]*[;]? multiple statments; final ';' is optional
tstmt= tvar = :type type variable assignment
stmt = [id[:type] [:]=]* ifex ids are (re-)assigned, and are available in later statements.
stmt = ^ifex Early function exit
ifex = apply
ifex = apply ? stmt trinary logic; the else-clause will default to 0
ifex = apply ? stmt : stmt trinary logic
apply = expr+ Lisp-like application-as-adjacent, e.g. str 5
expr = term [binop term]* gather all the binops and sort by prec
term = uniop term Any number of uniops
term = id++ post-inc/dec operators
term = id-- post-inc/dec operators
term = tfact bop+ stmts bop- Balanced/split operator arity-2, e.g. array lookup, ary [ idx ]
term = tfact bop+ stmts bop- stmt Balanced/split operator arity-3, e.g. array assign, ary [ idx ]:= val
term = tfact post A term is a tfact and some more stuff...
post = empty A term can be just a plain 'tfact'
post = (tuple) post Application argument list
post = .field post Field and tuple lookup
post = .field [:]= stmt Field (re)assignment. Plain '=' is a final assignment
post = .field++ OR .field-- Field reassignment.
tfact= fact[:type] Optionally typed fact
fact = id variable lookup
fact = num number
fact = "string" string
fact = bop+ stmts bop- Balanced/split operator, arity-1, e.g. array decl with size: [ 17 ]
fact = bop+ [stmts,[stmts,]*] bop- Balanced/split operator, arity-N, e.g. array decl with elements: [ "Cliff", "John", "Lisa" ]
fact = (stmts) General statements parsed recursively
fact = tuple Tuple builder
fact = func Anonymous function declaration
fact = @{ stmts } Anonymous struct declaration; assignments declare fields
fact = {binop} Special syntactic form of binop; no spaces allowed; returns function constant
fact = {uniop} Special syntactic form of uniop; no spaces allowed; returns function constant
tuple= ( stmts,[stmts,] ) Tuple; final comma is optional
binop= +-*%&/<>!= [] ]= etc; primitive lookup; can determine infix binop at parse-time, also pipe but GFM screws up
uniop= -!~ [] etc; primitive lookup; can determine infix uniop at parse-time
bop+ = [ Balanced/split operator open
bop- = ] ]= ]:= Balanced/split operator close
func = { [id[:type]* ->]? stmts } Anonymous function declaration, if no args then the -> is optional
str = [.\%]* String contents; \t\n\r% standard escapes
str = %[num]?[.num]?fact Percent escape embeds a 'fact' in a string; "name=%name\n"
type = tcon OR tvar OR tfun[?] OR tstruct[?] OR ttuple[?] // Types are a tcon or a tfun or a tstruct or a type variable. A trailing ? means 'nilable'
tcon = int, int[1,8,16,32,64], flt, flt[32,64], real, str[?] Primitive types
tfun = { [[type]* ->]? type } Function types mirror func decls
ttuple = ( [[type],]* ) Tuple types are just a list of optional types; the count of commas dictates the length, zero commas is zero length. Tuple fields are always final.
tstruct = @{ [id [tfld];]* } Struct types are field names with optional types or values.
tfld = ! : type ! = ifex Fields are untyped or typed or final-assigned (with computed expression type)
tvar = id Type variable lookup

SIMPLE EXAMPLES

Code Comment
1 1:int
Prefix unary operator ---
-1 -1:int application of unary minus to a positive 1
!1 0:int convert 'truthy' to false
Infix binary operator ---
1+2 3:int
1-2 -1:int
1<=2 1:int Truth === 1
1>=2 0:int False === 0
Binary operators have precedence ---
1+2*3 7:int standard precedence
1+2 * 3+4 *5 27:int
(1+2)*(3+4)*5 105:int parens overrides precedence
1// some comment
+2
3:int with bad comment
1 < 2 1:int true is 1, 1 is true
1 > 2 0:int false is 0, 0 is false
Float ---
1.2+3.4 4.6:flt
1+2.3 3.3:flt standard auto-widening of int to flt
1.0==1 1:int True; int widened to float
Simple strings ---
"Hello, world" "Hello, world":str
str(3.14) "3.14":str Overloaded str(:flt)
str(3) "3":str Overloaded str(:int)
str("abc") "abc":str Overloaded str(:str)
Variable lookup ---
math_pi 3.141592653589793:flt Should be math.pi but name spaces not implemented
primitive function lookup ---
+ "Syntax error; trailing junk" unadorned primitive not allowed
_+_ {{+:{int int -> int}, +:{flt flt -> flt}} returns a union of '+' functions. The underscores indicate a binary op
{_+_} Same as above
{!_} !:{int -> int1} function taking an int and returning a bool. The under indicates a prefix unary op
Function application, traditional paren/comma args ---
_+_(1,2) 3:int
_-_(1,2) -1:int binary version
-_(1) -1:int unary version
Errors; mismatch arg count ---
!() Call to unary function '!', but missing the one required argument
math_pi(1) A function is being called, but 3.141592653589793 is not a function type
_+_(1,2,3) Passing 3 arguments to +{flt64 flt64 -> flt64} which takes 2 arguments
Arguments separated by commas and are full statements ---
_+_(1, 2 * 3) 7:int
_+_(1 + 2 * 3, 4 * 5 + 6) 33:int
(1;2 ) 2:int just parens around two statements
(1;2;) 2:int final semicolon is optional
_+_(1;2 ,3) 5:int full statements in arguments
Syntax for variable assignment ---
x=1 1:int assignments have values
x:=1 1:int Same thing, not final
x=y=1 1:int stacked assignments ok
x=y= Missing ifex after assignment of 'y'
x=1+y Unknown ref 'y'
x=2; y=x+1; x*y 6:int
1+(x=2*3)+x*x 43:int Re-use ref immediately after def; parses as: x=(2*3); 1+x+x*x
x=(1+(x=2)+x) Cannot re-assign ref 'x'
x=(1+(x:=2)+x) 5:int RHS executes first, so parses as: x:=2; x=1+x+x
x++ 0:int Define new variable as 0, and return it before the addition
x:=0; x++; x 1:int Define new variable as 0, then add 1 to it, then return it
x++ + x-- 1:int Can be used in expressions
Conditionals ---
0 ? 2 : 3 3:int Zero is false
2 ? 2 : 3 2:int Any non-zero is true; "truthy"
math_rand(1)?(x=4):(x=3);x :int8 x defined on both arms, so available after but range bound
math_rand(1)?(x=2): 3 ;4 4:int x-defined on 1 side only, but not used thereafter
math_rand(1)?(y=2;x=y*y):(x=3);x :int8 x defined on both arms, so available after, while y is not
math_rand(1)?(x=2): 3 ;x 'x' not defined on false arm of trinary No partial-defs
math_rand(1)?(x=2): 3 ;y=x+2;y 'x' not defined on false arm of trinary More complicated partial-def
math_rand(1)?1 :int1 Can skip last arm, will default to 0
x:=0; math_rand(1) ? x++; x :int1 Side effects in the one arm
0 ? (x=2) : 3;x 'x' not defined on false arm of trinary
2 ? (x=2) : 3;x 2:int Off-side is constant-dead, so missing x-assign is ignored
2 ? (x=2) : y 2:int Off-side is constant-dead, so "Unknown ref 'y'" is ignored
x=1;2?(x=2):(x=3);x Cannot re-assign ref 'x' Re-assignment not allowed
x=1;2? 2 :(x=3);x 1:int Re-assign allowed & ignored in dead branch
math_rand(1)?1:int:2:int :int8 no ambiguity between conditionals and type annotations
math_rand(1)?1::2:int missing expr after ':'
math_rand(1)?1:"a" Cannot mix GC and non-GC types
math_rand(1)?(x:=4):(x:=3);x :int8 x mutably defined on both arms, so available after
math_rand(1)?(x:=4):(x:=3);x:=x+1 :int64 x mutably defined on both arms, so mutable after
x:=0; math_rand(1) ? (x:=4):3; x:=x+1 :int8 x partially updated, remains mutable after
x:=0; 1 ? (x =4):; x 4:int8 x final on 1 arm, dead on other arm, so final after
x:=0; math_rand(1) ? (x =4):3; x 'x' not final on false arm of trinary Must be fully final after trinary
Short circuit operators ---
0 && 2 0 Returns nil
1 && 2 2 Returns the non-nil result
0 && 1 || 2 && 3 3 || has lower precedence than &&
x:=y:=0; z=x++ && y++; (x,y,z) (1,0,0) increments x, but it starts zero, so y never increments
(x=1;x*x) && x+2 3 New variables defined in the first term available in both terms
1 && (x=2;0) || x+3 && x+4 'x' not defined prior to the short-circuit New variables in the 2nd term are NOT available afterwards
Anonymous function definition ---
{x y -> x+y} Types as a 2-arg function { int int -> int } or { flt flt -> flt }
{5}() 5:int No args nor -> required; this is simply a no-arg function returning 5, being executed
Identity mimics having type-vars via inlining during typing ---
id {A->A} No type-vars yet
id(1) 1:int
id(3.14) 3.14:flt
id(_+_) {{+:{int int -> int}, +:{flt flt -> flt}}
id(_+_)(id(1),id(math_pi)) 4.141592653589793:flt
Function execution and result typing ---
x=3; andx={y -> x & y}; andx(2) 2:int capture external variable
x=3; and2={x -> x & 2}; and2(x) 2:int shadow external variable
plus2={x -> x+2}; x Unknown ref 'x' Scope exit ends lifetime
fun={x -> } Missing function body
mul3={x -> y=3; x*y}; mul3(2) 6:int // multiple statements in func body
x=3; addx={y -> x+y}; addx(2) 5:int Overloaded + resolves to :int
x=3; mul2={x -> x*2}; mul2(2.1) 4.2:flt Overloaded _+_:flt resolves with I->F conversion
x=3; mul2={x -> x*2}; mul2(2.1)+mul2(x) 10.2:flt Overloaded mul2 specialized into int and flt variants
sq={x -> x*x}; sq 2.1 4.41:flt No () required for single args
Type annotations ---
-1:int -1:int
(1+2.3):flt 3.3:flt Any expression can have a type annotation
x:int = 1 1:int Variable assignment can also have one
x:flt = 1 1:int Casts for free to a float
x:flt32 = 123456789 123456789 is not a flt32 Failed to convert int64 to a flt32
1: Syntax error; trailing junk Missing type
-1:int1 -1 is not a int1 int1 is only {0,1}
"abc":int "abc" is not a int64
x=3; fun:{int -> int} = {x -> x*2}; fun(2.1)+fun(x) 2.1 is not a int64
x=3; fun:{real -> real} = {x -> x*2}; fun(2.1)+fun(x) 10.4:flt real covers both int and flt
fun:{real->flt32} = {x -> x}; fun(123 ) 123:int Casts for free to real and flt32
fun:{real->flt32} = {x -> x}; fun(123456789) 123456789 is not a flt32
{x:int -> x*2}(1) 2:int types on parmeters too
{x:str -> x}(1) 1 is not a str
Recursive and co-recursive functions ---
fact = { x -> x <= 1 ? 1 : x*fact(x-1) }; fact(3) 6:int fully evaluates at typing time
fib = { x -> x <= 1 ? 1 : fib(x-1)+fib(x-2) }; fib(4) :int does not collapse at typing time
is_even = { n -> n ? is_odd(n-1) : 1}; is_odd = {n -> n ? is_even(n-1) : 0}; is_even(4) 1:int
is_even = { n -> n ? is_odd(n-1) : 1}; is_odd = {n -> n ? is_even(n-1) : 0}; is_even(5) 0:int
Simple anonymous tuples ---
(1,"abc").0 1:int .n loads from the nth field; only parse-time constants are supported
(1,"abc").1 "abc"
Simple anonymous structures ---
@{x;y} @{x,y} Simple anon struct decl
a=@{x=1.2;y}; x Unknown ref 'x' Field name does not escape structure
a=@{x=1;x=2} Cannot define field '.x' twice
a=@{x=1.2;y;}; a.x 1.2:flt Standard "." field name lookups; trailing semicolon optional
(a=@{x;y}; a.) Missing field name after '.'
a=@{x;y}; a.x=1 Cannot re-assign field '.x' No reassignment yet
a=@{x=0;y=1}; b=@{x=2}; c=math_rand(1)?a:b; c.x :int8 Either 0 or 2; structs can be partially merged and used
a=@{x=0;y=1}; b=@{x=2}; c=math_rand(1)?a:b; c.y Unknown field '.y' Used fields must be fully available
dist={p->p.x*p.x+p.y*p.y}; dist(@{x=1}) Unknown field '.y' Field not available inside of function
dist={p->p.x*p.x+p.y*p.y}; dist(@{x=1;y=2}) 5:int Passing an anonymous struct OK
dist={p->p.x*p.x+p.y*p.y}; dist(@{x=1;y=2;z=3}) 5:int Extra fields OK
dist={p:@{x,y} -> p.x*p.x+p.y*p.y}; dist(@{x=1;y=2}) 5:int Structure type annotations on function args
a=@{x=(b=1.2)*b;y=b}; a.y 1.2:flt Temps allowed in struct def
a=@{x=(b=1.2)*b;y=x}; a.y 1.44:flt Ok to use early fields in later defs
a=@{x=(b=1.2)*b;y=b}; b Unknown ref 'b' Structure def has a lexical scope
dist={p->p//qqq
.//qqq
x*p.x+p.y*p.y}; dist(//qqq
@{x//qqq
=1;y=2})
5:int Some rather horrible comments
Named type variables Named types are simple subtypes
gal=:flt gal{flt -> gal:flt} Returns a simple type constructor function
gal=:flt; {gal} gal{flt -> gal:flt} Operator syntax for the function
gal=:flt; 3==gal(2)+1 1:int Auto-cast-away gal to get a flt
gal=:flt; tank:gal = gal(2) 2:gal
gal=:flt; tank:gal = gal(2)+1 3.0 is not a gal:flt No-auto-cast into a gal
Point=:@{x,y}; dist={p:Point -> p.x*p.x+p.y*p.y}; dist(Point(1,2)) 5:int type variables can be used anywhere a type can, including function arguments
Point=:@{x,y}; dist={p -> p.x*p.x+p.y*p.y}; dist(Point(1,2)) 5:int this dist takes any argument with fields @{x;y}, Point included
Point=:@{x,y}; dist={p:Point -> p.x*p.x+p.y*p.y}; dist(@{x=1;y=2}) @{x:1,y:2} is not a Point:@{x,y} this dist only takes a Point argument
Nilable and not-nil modeled after Kotlin ---
x:str? = 0 nil question-type allows nil or not; zero digit is nil
x:str? = "abc" "abc":str question-type allows nil or not
x:str = 0 "nil is not a str"
math_rand(1)?0:"abc" "abc"? Nil-or-string "abc"
(math_rand(1)?0 : @{x=1}).x Struct might be nil when reading field '.x' Must be provable not-nil
p=math_rand(1)?0:@{x=1}; p ? p.x : 0 :int1 not-nil-ness after a nil-check, so field de-ref is OK
x:int = y:str? = z:flt = 0 0:int nil/0 freely recasts
"abc"==0 0:int Compare vs nil
"abc"!=0 1:int Compare vs nil
nil=0; "abc"!=nil 1:int Another name for 0/nil
a = math_rand(1) ? 0 : @{x=1}; b = math_rand(1) ? 0 : @{c=a}; b ? (b.c ? b.c.x : 0) : 0 int1 Nested nilable structs
Recursive types ---
A= :(A?, int); A((0,2)) A:(nil,2) Simple recursive tuple
A= :(A?, int); A(0,2) A:(nil,2) Same thing using explicit args
A= :(A?, int); A(A(0,2),3) A:(A:(nil,2),3) Simple recursive tuple
A= :@{n:A?, v:flt}; A(@{n=0;v=1.2}).v 1.2:flt Named recursive structure
A= :@{n:A?, v:flt}; A(0,1.2).v 1.2:flt Same thing using explicit args
A= :@{n:B?, v:int}; a = A(0,2); a.n nil Unknown type B is never assigned, so no type error
A= :@{n:B, v:int}; B= :@{n:A, v:flt} B(@{n:A:@{n:B, v:int},v:flt} -> B) Types A and B are mutually recursive
List=:@{next:List?,val} List Linked-list type
List(List(0,1.2),2.3) List:@{next:List:@{next:nil;val:1.2};val:2.3} Sample linked-list, with all types shown
map_sq={x -> x ? (map_sq(x.0),x.1*x.1) : 0}; map_sq((0,1.2)) (nil,1.44) Strongly typed map_sq with a simple tuple
map={tree fun -> tree ? @{l=map(tree.l,fun);r=map(tree.r,fun);v=fun(tree.v)} : 0} Map a function over a tree in postfix order
Final fields are made with a final store not a final declaration ---
x=1 1:int Final local variable
x:=1 1:int Non-final local variable
x:=0; a=x; x:=1; (a,x) (0,1) Reassign local variable
x=1; x:=2 Cannot re-assignal final val 'x'
math_rand(1)?(x:=4):(x:=3);x:=x+1 int x mutable on both arms, so mutable after
x:=0; 1?(x=4):; x 4 x final on 1 arm, dead on other arm
x:=0; math_rand(1) ? (x =4):3; x 'x' not final on false arm of trinary Must be marked final on both arms, or dead on one.
x=@{n:=1;v:=2} @{n:=1;v:=2 Mutable field declaration and initial writes
x=@{n =1;v:=2}; x.n=3 Cannot re-assign read-only field '.n' Field initialized as final/read-only, cannot be changed
ptr0=@{p:=0;v:=1}; ptr1=@{p=ptr0;v:=2}; ptr0.p=ptr1; ptr0.p.v+ptr1.p.v 3:int final pointer-cycle is ok
ptr2rw = @{f:=1}; ptr2final:@{f=} = ptr2rw; ptr2final *@{f:=1} is not a *{f=} Cannot cast-to-final, can only make finals with a store
Early function exit ---
{ ^3; 5}() 3:int Early exit
x:=0; {1 ? ^2; x=3}(); x 0 Following assignment is ignored
x:=0; {^1 ? x=1 ; x=3}(); x 1:int Return of an ifex includes the side effect
x:=1; {1 ? ^; x=2}()+x 1 Early exit defaults to nil
Find: returns 0 if not found, or first element which passes predicate. ---
find={list pred -> !list ? ^0; pred(list.1) ? ^list.1; find(list.0,pred)}; find(((0,3),2),{e -> e&1}) int
Curried functions ---
for={A-> A+3 }; for 2 5:int
for={A->{B->A+B}}; for 2 3 5:int
for={pred->{body->!pred()?^;tmp=body(); tmp?^tmp;7}}; for {1}{0} 7:int
for={pred->{body->!pred()?^;(tmp=body())?^tmp; for pred body}}; for is not a keyword, just an ordinary variable
sum:=0; i:=0; for {i++ < 100} {sum:=sum+i;^}; sum" int Using a for loop to sum 0..99
True closures can make hidden state ---
incA= {cnt:=0; {cnt++} }(); incA();incA()" 1:int Returns a closure, which increments a private counter
cnt:=0; incA={cnt++}; incA();incA()+cnt" 3:int Bumps an upwards exposed variable
tmp = {cnt:=0;({cnt++},{cnt})}();incA=tmp.0;getA=tmp.1;incA();incA()+getA() 3:int Public functions to get & inc a private counter
Arrays ---
[3] [0] Create an array of length 3, typed as being all nils
ary = [3]; ary[0] 0 Get the zeroeth element
[3][0] 0 Get the zeroeth element of a new array
ary = [3]; ary[0]:=2 2:int Set an element
ary = [3]; ary[0]:=0; ary[1]:=1; ary[2]:=2; (ary[0],ary[1],ary[2]) (0,1,2)
[3]:[int] [0] Create an array of length 3, typed as being all nils, and assert that nil isa int
ary=[3];#ary 3 Array length
ary=[math_rand(10)];#ary int8 Unknown array length

LARGER EXAMPLES:

Build a list from tuples; first element is payload and second element is the rest of the list, with nil terminating the list:

  lst = (1,(2,(3,0)))

Find and return the first element of a list passing a predicate:

find = { list pred ->     // find is a 2-arg function
  !list ? ^0;             // if list is nil, return nil
  pred(list.0) ? ^list.0; // if the 0-element passes the predicate, return it
  find(list.1,pred);      // otherwise, recursively find on the rest of the list
}

Find the first odd element:

find(lst,{e -> e&1})

...which returns 1.

Here is a simple map call, mapping fun over the list elements:

map = { list fun ->                // map is a 2-arg function
  list                             // list is nil-checked
  ? (map(fun,list.1),fun(list.0))  // return a list (tuple) composed of apply map to the 1-element and fun to the 0-element
  : 0                              // return a nil
}

Double the list elements:

map(lst,{x -> x+x})

Returns (2,(4,(6,0))).

Both map and + calls are generic, so a list of strings work as well:

map( ("abc", ("def", 0)), {x -> x+x} )

Returns ("abcabc", ("defdef", 0)).

Full closures.

Here gen reduces a pair of functions to inspect or increment the private counter.

gen = {cnt:=0;({cnt++},{cnt})};

Calling gen twice makes two counters (and two sets of get/inc functions).

tmp:=gen(); incA=tmp.0;getA=tmp.1;
tmp:=gen(); incB=tmp.0;getB=tmp.1;

The two different sets of functions work on independent counters:

incA();incB();incA(); getA()*10+getB()

Returns: 2*10+1 or 21.

while and for are ordinary variables

while takes a pred predicate function and a body function. pred is tested on every iteration, and looping stops when false. body is executed for side-effects.

  while = { pred ->   // while is assigned to be a function which takes a predicate
    { body ->         // and a body
      pred() ?        // The predicate is evaluated; if false exit with 0
    (body();          // Else evalute the body for side effects
     while pred body) // And loop
    }
  };

Computing an array of squares:

ary=[100];
i:=0;
while {i++ < #ary} {
  ary[i]:=i*i
};
ary

Returns [int64], with the elements filled with the squares from 0 to 99.

for also takes a pred a body function. The predicate is executed and if it returns nil the for-loop returns nil. Then the body is executed, and if it returns a truthy value, that is the for-loop's result, otherwise the loop repeats. Early function exit works in the normal way for both pred and body. To continue, do an early return from the body with nil: ^. To break, do an early return from the body with not-nil: ^1.

  for = { pred->      // for is assigned to be a function which takes a predicate
    { body ->         // and a body
      pred() ?        // predicate is evaluated; if false, exit with 0
      // Else evaluate body.  If true, exit with that value
      ((tmp=body()) ? ^tmp; 
       for pred body) // Else loop
    }
  };

Return the index of the element matching e, or -1 if not found:

find = { ary e ->
  i:=0;
  idx = for { i++ < #ary } {
    ary[i]==e ? i+1     // if found, exit index+1 (so non-zero)
  };
  idx-1                 // if nil exit, then not-found so -1. 
}

This can be tightened by realizing that for is not a special form, but simply a function with a return value:

find = { ary e ->
  i:=0;
  for { i++ < #ary } {
    ary[i]==e ? i+1     // if found, exit index+1 (so non-zero)
  } -1                  // if nil exit, then not-found so -1. 
}

The AA Type System

AA uses a combined Hindley-Milner and Global Constant Propagation typing.

Extensions to Hindley-Milner

AA treats HM as a Monotone Analysis Framework; converted to a worklist style. The type variables are monotonically unified, gradually growing over time - and this is treated as the MAF lattice. Some normal Algo-W work gets done in a pre-pass; e.g. discovering identifier sources (SSA form), and building the non-generative set. Because of the non-local unification behavior type variables include a "dependent" set; a set of elements put back on the worklist if this type unifies, beyond the expected graph neighbors.

The normal HM unification steps are treated as the MAF transfer "functions", taking type variables as inputs and producing new, unified, type variables. Because unification happens in-place (normal Disjoint-set Union, the transfer "functions" are executed for side effects only, and return a progress flag. The transfer functions are virtual calls. Some steps are empty because of the pre-pass (Let,Con).

HM base (ground) types include anything from the GCP lattice, and are generally sharper than e.g. 'int'. Bases with values of '3' and the string "abc" are fine.

HM includes polymorphic structures and fields (structural typing not duck typing), polymorphic nil-checking and an error type variable. Both HM and GCP types fully support recursive types, e.g.{ f -> (f f) } is well typed with the recursive type A:{ A -> B }.

HM errors keep all the not-unifiable types, all at once. Further unifications with the error either add a new not-unifiable type, or unify with one of the prior types. These means that types can ALWAYS unify, including nonsensical unifications between e.g. the constant 5 and a struct @{ x,y }. The errors are reported when a type prints.

Because of the error handling, dead errors are not reported and not considered a typing error. It is OK to have type errors in dead code.

Unification typically makes many temporary type variables and immediately unifies them. For efficiency, this algorithm checks to see if unification requires an allocation first, instead of just "allocate and unify". The major place this happens is identifiers, which normally make a "fresh" copy of their type variable, then unify. I use a combined "make-fresh-and-unify" unification algorithm there. It is a structural clone of the normal unify, except that it lazily makes a fresh-copy of the left-hand-side on demand only; typically discovering that no fresh-copy is required.

To engineer and debug the algorithm, the unification step includes a flag to mean "actually unify, and report a progress flag" vs "report if progress". The report-only mode is aggressively asserted for in many places; all program points that can make progress are asserted as on the worklist.

Extensions to Global Constant Propagation

Global Constant Propagation (GCP) is an extension to the algorithm with the same name, applied to typing. GCP is another Monotone Analysis Framework with types from a Lattice. The lattice is a symmetric complete bounded (ranked) lattice, the meet is commutative and associative. The lattice has a dual (symmetric), and join is defined using meet and dual: ~(~x meet ~y).

The lattice contains the usual presentation for integers (a Top, constants, and a Bottom), extended with ranges (e.g. int1, int8, uint8, int32, int64). The lattice also contains a representation for IEEE754 numbers (flt32, flt64), for pointers and structures, and for functions. The lattice contains unique indices for each function (internally called a fidx), which are used to build a precise Call Graph at typing time. Similarly, the lattice contains unique indices for new allocation sites (internally called an alias) which are used to determine aliasing relationships up to equivalence-class precision. All types in the lattice understand nil exactly, e.g. there are nil-able and not-nil variants of all lattice elements. There are several other extensions not mentioned here.

Any value which can fit in a machine register (or a small count of them) is typed as a Scalar. This includes integers, floats, pointers and code pointers, and excludes structures and the code itself.

All types which contain other types (e.g. structures or functions) fully understand recursive types.

GCP is normally a forwards flow algorithm, flowing precise types forwards from the initial program point to the exit. This GCP also computes liveness, as a backwards flow algorithm with only slightly less precision than the forwards variant. As mentioned above in the HM presentation, errors in dead (not live) code are ignored.

GCP gets the normal MAF treatment, no surprises here. GCP may be run in two modes: optimistic vs pessimistic. In the pessimistic variant, the program always correct; the algorithm can be stopped at any point. However, locally correct transformations can be made (such as folding "3+5" into "8", or removing dead code). This pessimistic version is run as a pre-pass before the main combined algorithm to cleanup "easy" things. In the optimistic variant, the analysis must run to completion before the typing is correct, types are not incrementally correct. However, the optimistic variant delivers a more precise type (allows typing strictly more programs).

Combining Hindley-Milner and Global Constant Propagation

The combined algorithm includes transfer functions taking facts from both MAF lattices, producing results in the other lattice.

For the GCP → HM direction, the HM 'if' has a custom transfer function instead of the usual one. Unification looks at the GCP value, and unifies either the true arm, or the false arm, or both or neither. In this way GCP allows HM to avoid picking up constraints from dead code.

Also for GCP → HM, the HM ground terms or base terms include anything from the GCP lattice. E.g. both '3' and 'int64' are valid HM base terms.

For the HM → GCP direction, the 'apply' step has a customer GCP transfer function where the result from a call gets lifted (JOINed) based on the matching GCP inputs - and the match comes from using the same HM type variable on both inputs and outputs. This allows e.g. "map" calls which typically merge many GCP values at many applies (call sites) and thus end up being weakly typed as a Scalar to Scalar, to improve the GCP type on a per-call-site basis.

Test case aa/src/test/java/com/cliffc/aa/HM/TestHM.java:test45 demonstrates this combined algorithm, with a program which can only be typed using the combination of GCP and HM.

Since GCP is a forwards flow algorithm, functions that escape at the top level (e.g. module level or whole typing-event level) have to decide how they are called. GCP might assume they are not called (and uncalled functions are dead), or might assume they are called with the worst possible arguments (which would typically lead to type errors). Instead GCP uses the HM type variables, converted to the GCP lattice, to get initial types.

Core AA

There is a highly restricted subset of AA (really a plain lambda calculus) in aa/src/main/java/com/cliffc/aa/HM/HM.java to demonstrate this type system.

BNF for the "core AA" syntax:

   e  = number | string | primitives | (fe0 fe1*) | { id* -> fe0 } | id | id = fe0; fe1 | @{ (label = fe0)* }
   fe = e | fe.label                 // optional field after expression

BNF for the "core AA" pretty-printed types:

   T = base | { T* -> T } | @{ (label = T)* } | T? | X:T | X | (Error T+)
   base = any GCP lattice element, all are nilable
   Multiple stacked T????s collapse

Done Stuff

  • Static typing; types optional & fully inferred at every place.
  • Nil-ptr distinguished; nil/notnil types (e.g. Kotlin)
  • Structural-typing (duck typing with strong types). Interfaces.
  • Anonymous (and named) structure types.
  • Functional; 1st class functions. All functions are anonymous.
  • REPL
  • Dynamic code-gen; no seperate compilation step. Load Source & Go.
  • Limited overloads.
  • Overloading ops. No ambiguity / easy-to-read rules.
  • By default multi-arg ops are overloaded.
  • Direct SSA style code writing; no 'let' keyword.
  • default "x=1" makes a "val" until scope runs out (cannot re-assign)
  • Reassignment "x:=1" for mutable variables
  • Primitive values; int overflow OK;
  • Sub-byte ranges. Julia-like math typing.
  • Can type-name primitives, but no e.g. physical-unit-type math

Ideas, Desirables

  • H-M style typing.
  • JIT'ing.
  • {GC,Ref-Counting}: Ponder both vs requiring e.g. lifetime management (easy by just raising scope).
  • No exceptions?!!? Same as Elm: allow tagged union returns with an error return vs a non-error return. Force user to handle errors up&down the stack.
  • Lexical scope destructors.
  • Can ask for Int for BigInteger; unboxed arrays.
  • Pattern-matching too handy looking, need to have it
  • Parallel (and distributed) but also deterministic
  • "eval"
  • Monads? i/o, side-effect monads.
  • "back tick" computing, meta-computing

lifetime:

  • Distributed ref-cnting? (or Dist-GC?)
  • Ref-Counting does NOT given "immediate" destructor execution, but "soon".
  • Guaranteed to count down & release before the next instance of exact same constructor constructs?
  • Guaranteed to count down & release before the next loop backedge? Before base of containing loop?
  • Built-in "pools" for bulk-remove? Same as ref-counting.
  • Rust-style memory lifetime management; linear logic owner; borrowing; guaranteed death

concurrency:

  • Pony-style concurrency management
  • CAS built-in lang primitive: 'res := atomic(old,new)'; JMM
  • CPS for threads/concurrency;
  • not really actors but spawn/fork worker threads, run until they 'join' with parent.
  • Transactions-for-shared-memory-always (Clojure style)

types and name spaces and nils:

  • OOP namespaces (distinguished "this"; classes; single-inheritance vs interfaces)
  • Modules: independent shipping of code.
  • Elm-style union types
  • Single inheritance (or none; composition also works).
  • physical-unit-types, esp for arrays; e.g. "fueltank_size:gallon", and "speed:mile/hour", with auto-conversions

performance types:

  • performance types: "no autoboxing, no big-int, no dynamic dispatch, no serial loops?"
  • also: No GC allocations (only ref-counting & rust-style lifetime management).
  • Runs in "O(1) time"? Runs in "O(N) time"?
  • associated affine-value types: "this int is equal to that int, plus or minus a constant".
`fun copyInt2Dbl( src:[]int32, dst:[src.len+0]d64 )...`

OR

`fun copyInt2Dbl( len:int32, src:[len]int32, dst:[len]d64 )...`

OR

`fun slide( len:int32, off:int32, src:[>=len]a, dst[>=len+off]a )...`

maps & parallel loops:

  • maps-over-collections; default to parallel
  • parallel/distributed collections; deterministic
  • maps-with-folds require a associative fold op (and will run log-tree style)
  • maps-without-assoc-folds are serial loops: code it as a loop
  • user-spec'd iter types; for-loop syntax-sugar

serial loops:

  • For-loops with early-exit and Python else-clause
for( foo in foos )
  if( isAcceptable(foo) )
    break;
  else return DidNotFindItError()
  • To detect never-ran vs ran-but-not-exited:
    if( foos.empty() ) return foos_was_empty
    else
      for( foo in foos )
        if( isAcceptable(foo) )
          break;
      else return no_acceptable_in_foos()

misc:

  • embed 'fact' in string: "milage=%(dist/gals) mph". The expression (dist/gals) is a standard paren-wrapped 'fact' from the grammar.
  • multi-value-returns OK, sugar over returning a temp-tuple
  • Extra "," in static struct makers OK: "{'hello','world',}"
  • Tail-recursion optimization.
  • "var" can be reassigned but requires type keyword: "x:= 1"

Getting started

Download dependencies:

make lib

Build:

make

Run checks:

make check

Launch the REPL:

java -jar build/aa.jar