README.html

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<title>README.html</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

</head>

<body>

<h1 id="macquarieuniversitydepartmentofcomputing">Macquarie University, Department of Computing</h1>

<h2 id="comp332programminglanguages2018">COMP332 Programming Languages 2018</h2>

<h2 id="lintillaasimplefunctionalprogramminglanguage.">Lintilla, a simple functional programming language.</h2>

<h3 id="introduction">Introduction</h3>

<p><a href="http://hitchhikers.wikia.com/wiki/Lintilla">Lintilla</a> is a language that contains elements from mainstream functional languages such as <a href="https://en.wikipedia.org/wiki/Standard_ML">ML</a>, <a href="https://www.haskell.org/">Haskell</a>, <a href="https://www.scala-lang.org/">Scala</a> and <a href="https://en.wikipedia.org/wiki/Scheme_(programming_language)">Scheme</a>. It uses a syntax that has been borrowed from Dom&#8217;s favourite systems programming language <a href="https://www.rust-lang.org/">Rust</a>.</p>

<p>The description here provides a brief overview of the Lintilla language and its syntax. Further detail about aspects such as checking the validity of names or types and translating a program into an executable form will be added when we work on the Lintilla code generator in assignment 3.</p>

<h3 id="thelintillalanguage">The Lintilla language</h3>

<p>The basic unit of a Lintilla program is the <em>expression</em>. There are no statements, their place is taken by expressions whose type is the special type <code>unit</code>. You can think of unit as a type which has only one value, called <code>()</code>, and it is this value that is returned by those constructs which otherwise might be regarded as returning nothing (like a procedure or a <code>while</code> loop).</p>

<p>A program is essentially just a single expression. For example, here is a simple program that returns the result of a simple arithmetic operation:</p>

<pre><code>2 + 3 * 4
</code></pre>

<p>When this program is run it will print the value of the expression: the number
14.</p>

<p><em>Block expressions</em> are used to build programs out of smaller expressions. A block expression is a pair of curly braces containing zero or more expressions separated by semicolons (<code>;</code>). The idea is that when a block is executed each expression in that block is executed in turn, and the value computed by the last expression becomes the value of that block. For example, here is a program consisting of a block expression that declares two variables and uses their values in the expression whose values if returned from the block:</p>

<pre><code>{
   let a = 5;
   let b = a + 1;
   a * b
}
</code></pre>

<p>This program will print the result of multiplying <code>a</code> by <code>b</code>, so 30 will be printed. Here the name <code>a</code> can be used in the definition of <code>b</code> since <code>b</code> is defined later, but that is a name analysis issue, so we don&#8217;t need to worry about it for the moment. Notice also that the semicolon is used as an expression separator, not a line terminator; so the last line in this block isn&#8217;t followed by a semicolon. An empty block <code>{}</code> returns the unique value <code>()</code> of type <code>unit</code>.</p>

<p>A Lintilla program is a file with name ending <code>.lin</code> which we can think of as a single block. It isn&#8217;t surrounded by curly braces, because the beginning and end of the file serve the same purpose, but each expression in the file must be separated from the next by a semicolon. It is the value computed by the last expression in the file that is returned as the result of the program, and printed to the terminal.</p>

<p><em>Variables</em> come in two flavours <em>mutable</em> and <em>immutable</em>. The default flavour of variable is immutable, these are bound to a value when they are declared and they maintain the same value from there to the point where they go out of scope. An immutable variable is declared and bound to a value in a <code>let</code> expression:</p>

<pre><code>let doms_var = 10;
</code></pre>

<p>We can also declare a mutable variable; think of these as being bound to a memory cell that contains a value that can be changed. Mutable variables are bound and their values are initialised in <code>let mut</code> expressions:</p>

<pre><code>let mut doms_mutable_var = 20;
</code></pre>

<p>The value stored in the memory cell bound to a mutable variable can be changed using an assignment expression (<code>:=</code>):</p>

<pre><code>doms_mutable_var := doms_mutable_var + 1;
</code></pre>

<p>The scope of a variable extends from the point where it is declared to the end of the smallest enclosing block. Here again, however, this is a matter to handled by the semantic analysis phase of the compiler, which we will consider in a later assignment.</p>

<p><em>Functions</em> may also be declared, both at top level or within a block. For example here is the familiar factorial function given as a recursive function in Lintilla:</p>

<pre><code>fn factorial(n: int) -&gt; int {
  if n = 0 { 1 } else { n * factorial(n-1) }
}
</code></pre>

<p>This declaration defines a new function which takes a single parameter of type <code>int</code> and returns an <code>int</code> as its result. It introduces a new immutable variable called <code>factorial</code> which, as a first order approximation, we can think of as pointing to the code generated by compiling the function body. The result returned by a function is the value returned when its body, a block, is executed.</p>

<p>Functions may take more than one parameter:</p>

<pre><code>fn mod(n: int, m: int) -&gt; int {
  n - (n / m) * m
};
    
mod(23,10)  // program prints `3`, the remainder when `23` is divided by `10`.
</code></pre>

<p>A <em>procedure</em> is a function which returns no value, or more precisely it returns the unique value <code>()</code> of type <code>unit</code>. Procedures are declared by omitting the return type specification in a <code>fn</code> declaration:</p>

<pre><code>let mut x = 20;

fn add_n_to_x(n : int) {
  x := x + n
};

add_n_to_x(23);    // now the mutable variable `x` contains the value `43`
    
x                  // program prints the value `43`
</code></pre>

<p>The same effect can be gained by using the the explicit return type <code>unit</code>:</p>

<pre><code>fn add_n_to_x(n : int) -&gt; unit {
  x := x + n
}
</code></pre>

<p>The scope of a function name extends <strong>throughout</strong> the smallest block enclosing its declaration. This seems like a strange rule, but it allows us to declare (mutually) recursive functions</p>

<pre><code>fn f(n : int) -&gt; int {
  g(n-3)
};
    
fn g(n : int) -&gt; int {
  if 0 &lt; n { f(2*n) } else { n }
}
    
// Even though `g` is declared at a point after the declaration of `f` we can
// still refer to it in the body of `f`.
</code></pre>

<p>We do not distinguish declarations from other kinds of expressions for the purposes of parsing. That doesn&#8217;t mean that it will always make sense to use a declaration wherever any other kind of expression is appropriate, but just that this isn&#8217;t a decision that the parser will make. The places where declarations can legally occur will be determined, and enforced, by the Lintilla type checker.</p>

<p><em>Control expressions</em> we have already seen the principle control expression provided by Lintilla, the <code>if</code> expression. We might note that the language specifies that <code>if</code> expressions must have both a <em>then</em> and an <em>else</em> clause and that these must both be blocks enclosed in curly braces.</p>

<p>The language also provides <code>while</code> loops:</p>

<pre><code>fn fibonacci(n : int) -&gt; int {
  let mut res1 = 0;
  let mut res2 = 1;
  let mut count = n;
      
  while 0 &lt; count {
    let temp = res1;
    res1 := res2;
    res2 := temp + res1;
    count := count - 1
  };
      
  res1
}
</code></pre>

<p>Here again the body of a <code>while</code> must be a block enclosed in curly braces.</p>

<p>Finally Lintilla also has a <code>return</code> expression, which can be used to exit early from a block. A <code>return</code> may optionally take a parameter, and it is the value of that parameter which is returned as the result of the block expression when that <code>return</code> is executed. For example, in our Fibonacci function we might want to exit from the function early if the parameter passed to that function is negative:</p>

<pre><code>fn fibonacci(n : int) -&gt; int {
  if n &lt; 0 { return -1 } else {}; // If `n` is negative return the error value -1.
      
  let mut res1 = 0;
  let mut res2 = 1;
  let mut count = n;
      
  while 0 &lt; count {
    let temp = res1;
    res1 := res2;
    res2 := temp + res1;
    count := count - 1
  };
      
  res1
}
</code></pre>

<p><em>Expression</em> forms are interchangeable as long as they have the correct type. E.g., anywhere we can put a number can also take a block or some other kind of expression that evaluates to a number. For example, here is an artificial program that uses blocks nested inside an arithmetic operation:</p>

<pre><code>{
  let a = 3;
  a + 1
} *
{
  let b = 10;
  b - 1
}
</code></pre>

<p>Or more concisely:</p>

<pre><code>{ let a = 3; a + 1 } * { let b = 10; b - 1 }
</code></pre>

<p>We&#8217;ve seen a few different forms of expression: numbers, addition expressions, multiplication expressions and function call expressions. There are also other arithmetic operations, boolean values, boolean literals, relational and logical operators, and conditional expressions. The complete syntax of Lintilla is given below.</p>

<p>Finally Lintilla allows programmers to insert comments into their code. These begin with two slashes <code>//</code> and extend from there to the end of the line. Comments are ignored by the compiler, which treats them just like white space.</p>

<h3 id="lintillasyntax">Lintilla syntax</h3>

<p>Here is a complete context-free grammar for the Lintilla language, upon which the parser of a Lintilla compiler should be based:</p>

<pre><code>program : (exp &quot;;&quot;)* exp.
    
exp : assign
    | pexp
    | ifexp
    | whileexp
    | returnexp
    | letdecl
    | fndecl.

pexp : pexp &quot;=&quot; pexp
     | pexp &quot;&lt;&quot; pexp
     | pexp &quot;+&quot; pexp
     | pexp &quot;-&quot; pexp
     | pexp &quot;*&quot; pexp
     | pexp &quot;/&quot; pexp
     | &quot;-&quot; pexp
     | &quot;false&quot;
     | &quot;true&quot;
     | integer
     | app
     | &quot;(&quot; exp &quot;)&quot;.
     | block. 

app : app &quot;(&quot; ((exp &quot;,&quot;)* exp)? &quot;)&quot;
    | idnuse.

assign : idnuse &quot;:=&quot; exp.

block : &quot;{&quot; ((exp &quot;;&quot;)* exp)? &quot;}&quot;.

ifexp : &quot;if&quot; exp block &quot;else&quot; block.
    
whileexp : &quot;while&quot; exp block.
    
returnexp : &quot;return&quot; exp?.
    
letdecl : &quot;let&quot; &quot;mut&quot;? idndef &quot;=&quot; exp.
    
fndecl : &quot;fn&quot; idndef &quot;(&quot; ((paramdecl &quot;,&quot;)* paramdecl)? &quot;)&quot; (&quot;-&gt;&quot; tipe)? block.
    
paramdecl : idndef &quot;:&quot; tipe.
</code></pre>

<p>Finally, the syntax of types:</p>

<pre><code>tipe : &quot;unit&quot;
     | &quot;bool&quot;
     | &quot;int&quot;
     | &quot;fn&quot; &quot;(&quot; ((tipe &quot;,&quot;)* tipe)? &quot;)&quot; (&quot;-&gt;&quot; tipe)?
     | &quot;(&quot; tipe &quot;)&quot;.
</code></pre>

<p>We use the word <code>tipe</code> instead of <code>type</code> since the latter is a Scala keyword which prevents us from using it as the name of a parser in our code. A function type is specified using the syntax</p>

<pre><code>fn(type_1, type_2, ..., type_n) -&gt; res_type
</code></pre>

<p>which denotes the type of a function that takes <code>n</code> parameters of types <code>type_1</code>,&#8230;, <code>type_n</code> and returns a result of type <code>res_type</code>. We might note that a procedure has a type of the form</p>

<pre><code>fn(type_1, type_2, ..., type_n) -&gt; unit
</code></pre>

<p>and that Lintilla does not allow variables or parameters to have type <code>unit</code>. This, however, is a matter for the type analysis phase of the compiler which we don&#8217;t (yet) consdier here.</p>

<h4 id="precedenceandassociativity">Precedence and associativity</h4>

<p>The grammar above is not immediately suitable for encoding as a parser. The <code>pexp</code> non-terminal is ambiguous since it makes no allowance for precedence and associativity of the operators. You should rewrite that grammar production to implement the following precedence and associativity rules:</p>

<ul>
<li><p>The following expression constructs have precedence as shown from
lowest to highest with constructs on the same line having the same
precedence:</p>

<ol>
<li>equal and less than</li>
<li>addition and subtraction</li>
<li>multiplication and division</li>
<li>all other kinds of expression</li>
</ol>

<p>The constructs in the highest precedence category include unary negation, blocks in curly braces and expressions in parentheses.</p></li>
<li><p>All binary expression operators are left associative, except for the
relational operators (equality and less than) which are not
associative.</p></li>
</ul>

<hr />

<p><a href="http://orcid.org/0000-0002-4137-6982">Dominic Verity</a><br />
Last modified: 12 September 2018<br />
<a href="http://mozilla.org/MPL/2.0/">Copyright (c) 2018 by Dominic Verity and Anthony Sloane. Macquarie University. All rights reserved.</a></p>

</body>
</html>