part-1.html

<!doctype html>
<html>
	<head>
		<meta charset="utf-8">
		<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">

		<title>Compilers - Frontend</title>

		<link rel="stylesheet" href="css/reveal.css">
		<link rel="stylesheet" href="css/theme/solarized.css">

		<!-- Theme used for syntax highlighting of code -->
		<link rel="stylesheet" href="lib/css/zenburn.css">

		<!-- Printing and PDF exports -->
		<script>
			var link = document.createElement( 'link' );
			link.rel = 'stylesheet';
			link.type = 'text/css';
			link.href = window.location.search.match( /print-pdf/gi ) ? 'css/print/pdf.css' : 'css/print/paper.css';
			document.getElementsByTagName( 'head' )[0].appendChild( link );
		</script>
	</head>
	<body>
		<div class="reveal">
			<div class="slides">
				<section>
					<small>Compilers Crash Course &mdash; Part 1:</small>
					<h1>The Frontend</h1>
					<p>Lexical analysis, parsing, AST analysis, optimisations, interpreters, transpilation</p>
					<p><a href="http://github.com/lowjoel"><svg class="social-logo github" style="width: 1em; height: 1em; vertical-align: text-bottom; padding-bottom: 0.05em" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><g><path d="M12 2C6.477 2 2 6.477 2 12c0 4.42 2.865 8.166 6.84 9.49.5.09.68-.22.68-.485 0-.236-.008-.866-.013-1.7-2.782.603-3.37-1.34-3.37-1.34-.454-1.156-1.11-1.464-1.11-1.464-.908-.62.07-.607.07-.607 1.004.07 1.532 1.03 1.532 1.03.89 1.53 2.34 1.09 2.91.833.09-.647.348-1.086.634-1.337-2.22-.252-4.555-1.112-4.555-4.944 0-1.09.39-1.984 1.03-2.682-.104-.254-.448-1.27.096-2.646 0 0 .84-.27 2.75 1.025.8-.223 1.654-.333 2.504-.337.85.004 1.705.114 2.504.336 1.91-1.294 2.748-1.025 2.748-1.025.546 1.376.202 2.394.1 2.646.64.7 1.026 1.59 1.026 2.682 0 3.84-2.337 4.687-4.565 4.935.36.307.68.917.68 1.852 0 1.335-.013 2.415-.013 2.74 0 .27.18.58.688.482C19.138 20.16 22 16.416 22 12c0-5.523-4.477-10-10-10z"></path></g></svg> @lowjoel</a></p>
				</section>

				<section>
					<h2>Target Language</h2>
					<p>Building a toolchain for parts of JavaScript (ES5 first)</p>
					<p class="fragment">Not precisely the ES5 spec, because some simplifications might be necessary</p>

					<aside class="notes">
						<p>
							Sounds crazy? I won't implement <em>every</em> part, just small subsets to demonstrate
							concepts.
						</p>
						<p>
							Also, I'll appeal to your intuition of what the semantics ought to be. I will clarify if
							there is a need.
						</p>
					</aside>
				</section>

				<section>
					<h2>Overview</h2>
				</section>

				<section>
					<h3>Parallels with Natural Languages</h3>

					<table>
						<thead>
						<tr>
							<th>English</th>
							<th>Programming</th>
						</tr>
						</thead>
						<tbody>
						<tr>
							<td>Letter</td>
							<td>Byte</td>
						</tr>
						<tr>
							<td>Word</td>
							<td>Token</td>
						</tr>
						<tr>
							<td>Sentence</td>
							<td>Statement/Program</td>
						</tr>
						<tr class="fragment">
							<td>Grammar</td>
							<td>Grammar</td>
						</tr>
						</tbody>
					</table>

					<aside class="notes">
						<ul>
							<li>
								The point here is that the study of natural languages influenced the theoretical
								development of programming language theory.
							</li>
							<li>Humans process sentences via a set of grammar rules, so do computers.</li>
						</ul>
					</aside>
				</section>

				<section>
					<h3>Code as Data</h3>

					<p>Typically code and the data that is being operated on are treated separately</p>
					<p>It is <strong>important</strong> to break the distinction between code and data</p>
					<p>This is not new; look at languages like Lisp</p>
				</section>

				<section>
					<h3>Process</h3>

					<section>
						Input: Source file (byte stream)
					</section>
					<section>
						Lexical Analysis
					</section>
					<section>
						Parsing
					</section>
					<section>
						Abstract Syntax Tree
					</section>
				</section>

				<section>
					<h2>Lexical Analysis</h2>
				</section>

				<section>
					<h3>Lexical Analysis</h3>

					<section>
						<ul>
							<li>Bytes to tokens</li>
							<li>Main tool: Regular Expressions</li>
							<li>Historically, <code>flex</code> was used quite often</li>
						</ul>
					</section>

					<section>
						<pre><code class="flex">[A-Za-z_][A-Za-z0-9_]*                        return 'Identifier'

// 3.1, 3.1e-7
[0-9]+("."[0-9]+)?([eE][\-+]?[0-9]+)?\b       return 'FLOAT_NUMBER'
[0-9]+\b                                      return 'INT_NUMBER'</code></pre>

						<aside class="notes">
							This specifies several regexes to match identifiers and numbers. The terms which is
							returned can be used when writing the parsing rules, discussed next.
						</aside>
					</section>

					<section>
						The result of lexical analysis is a stream of tokens which can be consumed by a parser.
					</section>
				</section>

				<section>
					<h2>Parsing</h2>
				</section>

				<section>
					<h3>Parsing</h3>

					<section>
						<h3>Chomsky Hierarchy of Languages</h3>
						<table style="font-size: 75%">
							<thead>
							<tr>
								<th>Grammar</th>
								<th>Language</th>
								<th>Automaton</th>
							</tr>
							</thead>
							<tbody>
							<tr>
								<th>Type-0</th>
								<td>Recursively Enumerable</td>
								<td>Turing Machine</td>
							</tr>
							<tr>
								<th>Type-1</th>
								<td>Context Sensitive</td>
								<td>Linear-bounded non-deterministic Turing Machine</td>
							</tr>
							<tr>
								<th>Type-2</th>
								<td>Context Free</td>
								<td>Non-deterministic pushdown automaton</td>
							</tr>
							<tr>
								<th>Type-3</th>
								<td>Regular</td>
								<td>Finite State automaton</td>
							</tr>
							</tbody>
						</table>

						<small>*Table in <a href="https://en.wikipedia.org/wiki/Chomsky_hierarchy">Wikipedia</a></small>

						<aside class="notes">
							We've already covered regular expressions previously. That actually is the simplest
							possible language.

							The bulk of programming languages are context free languages as far as possible to ease
							the writing of parsers. Some (harder to parse) languages are context sensitive languages.

							You can use a more powerful automaton to parse a language, but not the other way around.
							This is why you should never try to check if a bunch of HTML markup is valid using a regex.
						</aside>
					</section>

					<section>
						<h3>Context free languages</h3>

						<p>We can try to write a context-free grammar for JavaScript</p>
						<p>Typically, grammars are written using Backus-Naur Form (BNF)</p>
					</section>

					<section>
						<h3>Backus-Naur Form</h3>

						<p>Yep. The dudes behind Fortran, and ALGOL.</p>
						<pre><code>&lt;expr&gt; ::= &lt;number&gt; |
           &lt;expr&gt;+&lt;expr&gt; |
           &lt;expr&gt;-&lt;expr&gt; |
           &lt;expr&gt;*&lt;expr&gt; |
           &lt;expr&gt;/&lt;expr&gt; |
           '('&lt;expr&gt;')'
           ;</code></pre>
					</section>

					<section>
						<h3>Parser Generators</h3>

						<p>After defining the grammar in BNF, it is possible to generate a parser</p>
						<p>The parser will take tokens as input, and return an Abstract Syntax Tree</p>
						<p><a href="http://localhost:63342/api/file/demo/calculator/calculator.jison" target="_blank">Demo</a></p>

						<aside class="notes">
							<p>
								The syntax tree is "abstract" because syntactic elements, such as grouping parentheses,
								have been removed.
							</p>

							<p>
								You can hand-write parsers, but usually hand-written parsers do not have the same parsing
								power as generators because generators can usually handle backtracking better than you
								would hand-write.
							</p>

							<p>
								Several parsers are hand-written, however. For example, gcc is a handwritten parser.
							</p>
						</aside>
					</section>

					<section>
						<h3>Esprima</h3>
						<p>Because we are using JavaScript, there is already a ES6 parser written for us!</p>
						<p><a href="https://astexplorer.net/">Demo</a></p>
					</section>
				</section>

				<section>
					<h2>AST Analysis</h2>
				</section>

				<section>
					<h3>AST Analysis</h3>
					<section>
						<h3>Type Inference</h3>
						<p>Type Inference is the process of deducing the type of values in a given program</p>
						<p>Types can be deduced from constants, and propagated towards the end of the function</p>
						<p>An algorithmic example for type inference is the Hindley-Milner Algorithm W</p>

						<aside class="notes">
							<p>I know JavaScript doesn't care, but I wanted to illustrate a point.</p>
						</aside>
					</section>

					<section>
						<h3>Type Inference</h3>
						<pre><code class="javascript">function square(x) { // 'A -> ...
    return x * x;    // 'A must support operator*: 'A -> 'A -> 'B
}                    // 'A -> 'B
let x = 3;           // x: Number
let y = square(x);   // y: Number (`square`: Number -> Number)
console.log(y);      // console.log: 'A -> ()
let z = 'test';      // z: String
console.log(y + z);  // operator+: String -> Number -> String</code></pre>
						<p class="fragment">Flow works by the same principles!</p>
					</section>

					<section>
						<h3>Optimisations</h3>
						<p>Constant folding: finding values which can be resolved at compile-time</p>
						<pre><code class="javascript">var answer = 6 * 7; // parses as:
{
  "type": "VariableDeclarator",
  "id": { "type": "Identifier", "name": "answer" },
  "init": {
    "type": "BinaryExpression", "operator": "*",
    "left": { "type": "Literal", "value": 6, "raw": "6" },
    "right": { "type": "Literal", "value": 7, "raw": "7" }
  }
}</code></pre>
					</section>

					<section>
						<h3>Optimisations</h3>
						<p>Common subexpression elimination: removing duplicated AST nodes</p>
						<pre><code class="javascript">let y = x * x + x * x; // parses as
{
  "type": "BinaryExpression", "operator": "+",
  "left": {
    "type": "BinaryExpression", "operator": "*",
    "left": { "type": "Identifier", "name": "x" },
    "right": { "type": "Identifier", "name": "x" }
  },
  "right": {
    "type": "BinaryExpression", "operator": "*",
    "left": { "type": "Identifier", "name": "x" },
    "right": { "type": "Identifier", "name": "x" }
  }
}
</code></pre>
					</section>
				</section>

				<section>
					<h2>Interpreters</h2>

					<section>
						<h3>Principle</h3>
						<p>If the AST represents the program, we can run the program</p>
						<p class="fragment">
							Observation: If we traverse the tree using DFS, it should give the same effect of running
							the program
						</p>
					</section>

					<section>
						<h3>Evaluate-and-Apply</h3>
						<p>We can come up with evaluation rules</p>
						<ul>
							<li>Primitive literals evaluate to themselves</li>
							<li>Identifiers evaluate to the value stored in the environment</li>
							<li>Recursively look up the enclosing environments if not found in current one</li>
						</ul>
					</section>

					<section>
						<h3>Evaluate-and-Apply</h3>
						<ul>
							<li>
								Binary operators are the result of the left and right operands evaluated, followed
								by the operation
							</li>
							<li>
								Unary operators are the result of the operand evaluated, followed by the operation
							</li>
						</ul>
					</section>

					<section>
						<h3>Evaluate-and-Apply</h3>
						<ul>
							<li>Sequences of statements are evaluated sequentially</li>
							<li>Expression statements evaluate to the expression result*</li>
							<li>Variable declaration statements evaluate the initial values, then are added to the
								environment**</li>
							<li>Return statements return a special result which stops subsequent statements</li>
							<li>Real return result is unwrapped at function call site</li>
						</ul>

						<aside class="notes">
							<p>* Doesn't handle hoisting</p>
							<p>** Not really true in JavaScript, but this makes the toplevel work like the console.</p>
						</aside>
					</section>

					<section>
						<h3>Evaluate-and-Apply</h3>
						<ul>
							<li>
								Function definitions capture its enclosing environment, and evaluate to a function
								object
							</li>
							<li>
								Applying a function is creating a new environment which points to the captured
								environment as a parent, and associating the parameters with the provided values
							</li>
						</ul>
					</section>

					<section>
						<h3>The Meta-Circular Evaluator</h3>
						<p><a href="http://localhost:63342/api/file/demo/simple/run.js" target="_blank">Walkthrough</a></p>
					</section>
				</section>

				<section>
					<h2>Transpilation</h2>

					<section>
						<h3>Principle</h3>
						<p>If the AST represents the program, we can change the program</p>
						<p class="fragment">This process can involve changing the semantics of the program!</p>
					</section>

					<section>
						<h3>Example: Arrow function transpilation</h3>
						<pre><code class="javascript">(a) => a * 2;
// equivalent to:
function(a) { return a * 2; }</code></pre>
						<p>and</p>
						<pre><code class="javascript">(a) => { return a * 2 };
// equivalent to:
function(a) { return a * 2; }</code></pre>

						<p><a href="http://localhost:63342/api/file/demo/arrow/run.js" target="_blank">Demo</a></p>
					</section>
				</section>

				<section>
					<h2>Questions?</h2>
				</section>
			</div>
		</div>

		<script src="lib/js/head.min.js"></script>
		<script src="js/reveal.js"></script>

		<script>
			// More info https://github.com/hakimel/reveal.js#configuration
			Reveal.initialize({
				history: true,

				// More info https://github.com/hakimel/reveal.js#dependencies
				dependencies: [
					{ src: 'plugin/markdown/marked.js' },
					{ src: 'plugin/markdown/markdown.js' },
					{ src: 'plugin/notes/notes.js', async: true },
					{ src: 'plugin/highlight/highlight.js', async: true, callback: function() { hljs.initHighlightingOnLoad(); } }
				]
			});
		</script>
	</body>
</html>