UQ-PAC · katrinafyi · Sep 2, 2024 · Sep 5, 2024 · Sep 5, 2024 · Oct 10, 2024
diff --git a/docs/offline.md b/docs/offline.md
@@ -0,0 +1,86 @@
+# Offline partial evaluation
+
+## Motivation
+
+The offline partial evaluation is motivated by improving the experience
+for users seeking to integrate ASLp within other tools.
+With the original online method,
+only one opcode could be processed at a time, and
+each new opcode would require a complete traversal and simplification
+of the ASL specification.
+This is obviously inefficient
+and necessitates a tight coupling between ASLp
+and programs looking to use it.
+
+## Introduction
+
+Offline partial evaluation aims to improve this
+by performing much of the partial evaluation _ahead-of-time_
+instead of once per-instruction.
+In the offline method,
+partial evaluation operates with only the knowledge
+that the opcode is constant,
+but without knowledge of the _value_ of the opcode.
+
+Further,
+we desire a separation of the "decoding" phase
+(where opcode bits are examined to determine _which_ instruction is represented),
+and the "execution" phase (where opcodes are executed and their actions performed).
+This delineation is implemented by hand-written lifters,
+where the language of the semantics IR is separate from the language used to implement the lifter itself.
+
+ASL itself has no such separation, but we can compute it with a _binding-time analysis_.
+In this analysis,
+constants and the instruction opcode are marked as *lift-time*, then the analysis
+traverses the AST in this way:
+- if an expression uses only lift-time values, it is also marked as *lift-time*, otherwise
+- the expression is marked as *run-time*.
+Lift-time values are simply emitted into the lifter's language, and
+will not be visible within the final semantics.
+Run-time-marked values are translated to a `gen_` prefixed function,
+indicating that this should be emitted into the semantics and deferred until run-time.
+This gives us an AST for a program which takes an opcode and then constructs a residual program
+representing the semantics of the instruction.
+
+In particular, this representation
+within an AST
+enables efficient translation of the lifter to arbitrary lifter languages and semantics languages.
+This is desirable, entirely eliminating the need to (de)serialise
+the semantics and communicate across language boundaries.
+
+## Overview
+
+The entry-point for the offline transformation process is the `run` function,
+near the end of [symbolic_lifter.ml](/libASL/symbolic_lifter.ml).
+Here, the process is divided into stages:
+- **Stage 1** - mock decoder & instruction encoding definitions:
+  Converts the ASL decoder tree into separate functions making up a proper ASL program.
+  Makes use of [decoder_program.ml](/libASL/decoder_program.ml).
+- **Stage 2** - call graph construction:
+  Identifies reachable functions, using [call_graph.ml](/libASL/call_graph.ml).
+- **Stage 3** - simplification:
+  Minor cleanup of temporary dynamic-length bit-vectors and unsupported structures.
+- **Stage 4** - specialisation:
+  Performs requirement analysis to ensure bit-vector lengths
+  and loop iterations are statically known.
+  Inserts splits to branch on conditions determining these quantities. [req_analysis.ml](/libASL/req_analysis.ml).
+- **Stage 5** - disassembly:
+  Invokes the online partial evaluation to reduce and inline structures, then
+  simplifies with BDDs (an opcode-sensitive value analysis to, e.g., eliminate always-true assertions),
+- **Stage 6** - cleanup:
+  Remove unused variables within each instruction function.
+- **Stage 7** - offline transform:
+  Performs binding-time analysis and transformations,
+  then BDD-based copy-propagation transform and one final cleanup. [offline_opt.ml](/libASL/offline_opt.ml).
+
+After the lifter is generated, it is passed to a "backend"
+which performs the (mostly syntactic) translation
+to a targeted lifter language.
+
+## Backends
+
+TODO: mechanism of a backend and how to implement one.
+
+- given a set of instruction functions
+- instruction-building interface
+- required: mutable variables, bitvector, bool, and int representations
diff --git a/docs/online.md b/docs/online.md
@@ -0,0 +1,157 @@
+# Online partial evaluation
+
+## Background
+
+Online partial evaluation traverses the specification,
+closely mirroring concrete evaluation,
+and specialises expressions as it encounters them.
+It does this "online" by recording,
+for each variable,
+whether that variable is a concrete, symbolic, or unknown value
+(in order of simplification potential).
+
+- Concrete values are simple literals which can almost always be immediately evaluated
+  in expressions.
+- Pure symbolic expressions are pure transformations of a combination
+  of concrete and symbolic/unknown values.
+  For example, this includes "(x + 1) - 1" and this can be used to simplify
+  using algebraic properties of integers and bitvectors.
+- The last type, unknown, is any computation which is opaque at partial-evaluation time.
+  Most often, this is register and memory accesses.
+  When encountered in an expression, this will emit code to perform the computation and store
+  it into a read-only variable, transforming it into a pure symbolic expression.
+
+## Implementation
+
+Within ASLp, the partial evaluation features are implemented by three files
+with different responsibilities.
+These are: symbolic.ml, dis.ml, and transforms.ml.
+This corresponds to the order from lower-level concepts (expressions)
+to higher-level ones (statements and programs).
+
+### symbolic.ml
+
+In the implementation, this begins at the [symbolic.ml](/libASL/symbolic.ml)
+file which decribes expression-level symbolic operations.
+Here, the type of `sym` is defined simply as either
+a concrete value or a pure symbolic expression
+(the concept of "unknown" values lives outside this file).
+
+Moreover, this file implements symbolic analogues
+of existing primitive operations on concrete values
+([value.ml](/libASL/value.ml) and [primops.ml](/libASL/primops.ml])).
+Essentially, these functions perform a concrete evaluation
+if all (or sufficiently many) operands are concretely known, and
+return a symbolic expression otherwise.
+The expression may be simplified through the
+expression-level simplification whenever
+its structure allows.
+Some notable instances are:
+- identities of bitwise "and" and "or" and integer "mul", "add", and "sub",
+- slices of slices,
+- (sign/zero) extension of extensions.
+
+### dis.ml
+
+[dis.ml](/libASL/dis.ml) performs a recursive traversal of the ASL AST,
+calling functions from symbolic.ml as needed.
+This file is responsible for inter-statement and inter-procedural effects,
+including the collection of the residual program.
+
+The structure in this file is designed to mirror that of the interpreter's
+evaluation methods ([eval.ml](/libASL/eval.ml)).
+For instance, there are corresponding functions for expressions:
+```ocaml
+val dis_expr : AST.l -> AST.expr -> sym rws
+val eval_expr : AST.l -> Env.t -> AST.expr -> value
+```
+Here, the _'a rws_ return value is a computation inside the reader-writer-state monad (discussed below).
+
+The entry-point to the entire partial evaluator is the _dis\_core_ method.
+This invokes the appropriate "dis" functions then performs a number of post-processing
+transformations, described next.
+
+#### LocalEnv / DisEnv
+
+The LocalEnv structure stores and manipulates location-sensitive information during the
+partial evaluation.
+Most importantly, this includes the stack 
+mappings of variables to their concrete value or pure symbolic expression.
+To facilitate easier debugging, this also manually records a stack of the
+AST nodes currently being evaluated
+(the ordinary stack trace is not useful in the presence of the monad abstraction).
+
+The DisEnv contains the _'a rws_ type &mdash; a reader-writer-state monad:
+```ocaml
+type 'a rws = Eval.Env.t -> LocalEnv.t -> ('a, LocalEnv.t, stmt list)
+```
+This implements an abstraction for:
+- a read-only view of globals and constants ("eval env"),
+- a mutable LocalEnv state ("local env"), and
+- a write-only list of statements making up the residual program.
+
+This mechanism is invoked by the `let@` and `let+` syntaxes
+which composes RWS computations in a sensible way.
+As an example, with many type annotations, consider
+```ocaml
+val int_is_zero : int rws -> bool rws =
+  fun (x : int rws) ->
+    let@ y : int = x in
+    DisEnv.pure (y = 0)  
+```
+Essentially, the `let@` notation has the function of "unwrapping" a rws computation
+and placing it within another rws computation.
+`let+` can be used where the final result (at the bottom of the let expression)
+is a `DisEnv.pure` computation.
+By using `let+`, the `DisEnv.pure` can be omitted.
+Other functions exist to manipulate particular rws computations in particular ways
+(e.g. a list of rws can become a rws of list).
+
+The DisEnv struct also contains methods to retrieve variables
+(searching the local bindings and then the global bindings),
+all returning rws computations.
+To emit a statement into the residual program, it provides a `write` function.
+
+### transforms.ml
+
+[transforms.ml](/libASL/transforms.ml) implements transforms which are executed after the
+main partial evaluation phase.
+Some of these are conventional compiler transformations
+(e.g. removing unused variables, common subexpression factoring, and copy-propagation).
+The majority, however, are designed to address deficiencies
+(verbosity or unsupported constructs) in the residual program.
+We list a few transformations which are of particular interest.
+
+- StatefulIntToBits / IntToBits:
+  This transformation converts expressions involving arbitrary-precision integers
+  into expressions of fixed-length bitvectors
+  by way of an interval analysis.
+
+  The StatefulIntToBits modules implements this through abstract interpretation
+  with a domain of $\text{Width}\times \text{IsSigned} \times \mathbb Z^2$,
+  recording bit-width, signedness, and lower and upper bounds.
+
+- RedundantSlice: Eliminates slice operations which do not reduce the bit-width.
+  Crucially, this is also responsible for converting slices with non-static indices
+  into shift/truncate operations.
+- bits_coerce_narrow: Pushes slices into sub-expressions where possible.
+- CaseSimp: Removes unreachable assertions placed after exhaustive case checks.
+- RemoveRegisters: Replaces expressions of ASL's register type with ordinary bit-vectors.
+- CopyProp
+- CommonSubExprElim
+- RemoveUnused
+
+## Grammar
+
+The ASL grammar is defined in [asl.ott](/libASL/asl.ott).
+The ott program translates this into a Menhir grammar
+which then generates a parser and an associated Ocaml AST.
+The generated code modules include Asl_ast, Asl_parser, and Asl_parser_pp.
+
+
+For printing AST nodes and other structures, functions beginning with `pp_` are available
+in [asl_utils.ml](/libASL/asl_utils.ml) and [utils.ml](/libASL/utils.ml).
+These will produce an indented human-readable string.
+To emit AST nodes in the structured "aslt" format,
+use `Utils.to_string (Asl_parser_pp.pp_raw_stmt stmt)` or similar.
+
diff --git a/docs/readme.md b/docs/readme.md
@@ -0,0 +1,14 @@
+# ASLp documentation
+
+This folder will provide documentation for ASLp,
+primarily of interest to developers of ASLp and
+people seeking to integrate ASLp.
+
+ASLp is a partial evaluator for the ARM specification language.
+The most established functionality is in the [online partial evaluation](./online.md).
+The new [offline partial evaluation](./offline.md), building on the online features,
+is available for easier integration with projects.
+
+The project is validated by a number of [tests](./testing.md) which are run
+on pushes and merge requests to the repository.
+
diff --git a/docs/testing.md b/docs/testing.md
@@ -0,0 +1,7 @@
+# Testing infrastructure
+
+test files are stored within the [tests/](/tests) directory.
+
+- unit testing (very limited)
+- differential tests within coverage folder (bespoke infrastructure to implement pipeline, then dune diff)
+- cram expect tests ending in .t (golden tests, also dune based)
diff --git a/docs/vectoriser.md b/docs/vectoriser.md
@@ -0,0 +1,3 @@
+# vectoriser
+
+todo