UQ-PAC · ailrst · Jan 24, 2024 · Nov 7, 2023 · Nov 7, 2023 · Nov 7, 2023
diff --git a/docs/il-cfg.md b/docs/il-cfg.md
@@ -0,0 +1,143 @@
+CFG Iterator Implementation
+===========================
+
+This file explains the in-place CFG representation on top of the IL.
+
+Motivations
+-----------
+
+We want a unified IL and CFG representation to avoid the problem of keeping two datastructures in sync, 
+and to essentially avoid the problem of defining the correspondence between the static analysis state domain, and 
+the IL in order to apply a transformation to the IL using the CFG results.  
+
+It also reduces the number of places refactors need to be applied, and reduces memory overhead for static analyses 
+(hopefully). 
+
+
+Interpreting the CFG from the IL
+--------------------------------
+
+The IL has two structural interpretations:
+
+1. Its syntax tree; expressions have sub expressions and so on.
+    - This can be traversed using Visitors
+    - The traversal order is defined by the order of terms in the language with a depth-first traversal of sub-terms.
+2. Its control flow graph; this is part of the language's semantics, and is inferred from the Jump and Call statements.
+    - This is traversed using the control flow iterator, or by constructing the separate Tip-style CFG and traversing that.
+      From here on we describe the 'control-flow iterator'.
+    - The traversal order is defined by the `Dependency` structure and `Worklist` solvers and the predecessor/successor
+      relation between pairs of nodes
+
+We need to derive the predecessor/successor relation on CFG nodes IL .
+
+1. CFG positions are defined as 
+    - The entry to a procedure
+    - The single return point from a procedure
+    - The beginning of a block within a procedure
+    - A statement command within a block
+    - A jump or call command within a block
+
+For example we define the language as statements for horn clauses. (`A :- B` means B produces A, with `,` indicating 
+conjunction and `;` indicating disjunction)
+
+First we have basic blocks belonging to a procedure. 
+
+    Procedure(id)
+    Block(id, procedure) 
+    EntryBlock(block_id, procedure)
+    ReturnBlock(block_id, procedure) 
+    Block(id, procedure) :- EntryBlock(id, procedure); ReturnBlock(id, procedure)
+
+A list of sequential statements belonging to a block
+
+    Statement(id, block, index)
+
+A list of jumps (either Calls or GoTos) belonging to a block, which occur after the statements. GoTos form the 
+intra-procedural edges, and Calls form the inter-procedural edges. 
+
+    GoTo(id, block, destinationBlock)  // multiple destinations
+    Call(id, block, destinationProcedure, returnBlock), count {Call(id, block, _, _)} == 1 
+    Jump(id, block) :- GoTo(id, block, _) ; Call(id, block, _, _)
+
+Statements and Jumps are both considered commands. All IL terms, commands, blocks, and procedures, have a unique
+identifier. All of the above are considered IL terms.
+
+    Command(id) :- Statement(id, _, _) ; Jump(id, _)
+    ILTerm(id) :- Procedure(id); Block(id, _); Command(id) 
+
+The predecessor/successor relates ILTerms to ILTerms, and is simply defined in terms of the nodes 
+
+    pred(i, j) :- succ(j, i)
+
+    succ(block, statement) :- Statement(statement, block, 0)
+    succ(statement1, statement2) :- Statement(statement1, block, i), Statement(statement2, block, i + 1)
+    succ(statement, goto) :- Statement(block, _last), Jump(block, goto), _last = max i forall Statement(block, i)
+
+    succ(goto, targetBlock) :- GoTo(goto, _, _, targetBlock) 
+
+    succ(call, return_block) :- Call(call, block, dest_procedure, return_block)
+
+For an inter-procedural CFG we also have:
+
+    succ(call, return_block) :- ReturnBlock(return_block, call), Procedure(call)
+    succ(call, targetProcedure) :- Call(call, _, _, targetProcedure) 
+    succ(exit, returnNode) :- ProcedureExit(exit, procedure, call), CallReturn(returnNode, call)
+
+So a sequential application of `succ` might look like
+
+    ProcedureA -> {Block0} -> {Statement1} -> {Statement2} -> {Jump0, Jump1} ->  {Block1} | {Block2} -> ...
+
+Implementation
+--------------
+
+We want it to be possible to define `succ(term, _)` and `pred(term, _)` for any given term in the IL in `O(1)`. 
+Successors are easily derived but predecessors are not stored with their successors. Furthermore `ProcedureExit`, 
+and `CallReturn` are not inherently present in the IL. 
+
+In code we have a set of Calls, and Gotos present in the IL: these define the edges from themselves to their target. 
+
+Then all vertices in the CFG---that is all Commands, Blocks, and Procedures in the IL---store a list of references to 
+their set of incoming and outgoing edges. In a sense the 'id's in the formulation above  become the JVM object IDs.
+
+For Blocks and Procedures this means a `Set` of call statements. For Commands this means they are 
+stored in their block in an intrusive linked list. 
+
+Specifically this means we store
+
+    Command:
+        - reference to parent block
+        - procedure to find the next or previous statement in the block
+        - IntrusiveListElement trait inserts a next() and previous() method forming the linked list
+
+    Block
+        - reference to parent procedure
+        - list of incoming GoTos
+        - list of Jumps including
+            - Outgoing Calls
+            - Outgoing GoTos
+
+    Procedure
+        - list of incoming Calls
+        - subroutine to compute the set of all outgoing calls in all contained blocks
+
+This means the IL contains: 
+   - Forward graph edges in the forms of calls and gotos
+   - Forward syntax tree edges in the form of classes containing their children as fields
+   - Backwards graph edges in the form of lists of incoming jumps and calls
+      - Procedure has list of incoming calls
+      - Block has list of incoming gotos 
+   - Backwards syntax tree edges in the form of a parent field
+     - Implementation of the `HasParent` trait.
+
+To maintain the backwards edges it is necessary to make the actual data structures private, and only allow 
+modification through interfaces which maintain the graph/tree.  
+
+Jumps:
+- Must implement an interface to allow adding or removing edge references (references to themself) to and from their 
+  target 
+
+Blocks and Procedures:
+- Implement an interface for adding and removing edge references 
+
+Furthermore;
+- Reparenting Blocks and Commands in the IL must preserve the parent field, this is not really implemented yet
diff --git a/src/main/scala/analysis/Analysis.scala b/src/main/scala/analysis/Analysis.scala
@@ -320,4 +320,4 @@ class MemoryRegionAnalysisSolver(
     constantProp: Map[CfgNode, Map[Variable, FlatElement[BitVecLiteral]]]
 ) extends MemoryRegionAnalysis(cfg, globals, globalOffsets, subroutines, constantProp)
     with IntraproceduralForwardDependencies
-    with SimpleMonotonicSolver[CfgNode, Set[MemoryRegion], PowersetLattice[MemoryRegion]]
+    with SimpleMonotonicSolver[CfgNode, Set[MemoryRegion], PowersetLattice[MemoryRegion]]
diff --git a/src/main/scala/analysis/BasicIRConstProp.scala b/src/main/scala/analysis/BasicIRConstProp.scala
@@ -0,0 +1,91 @@
+package analysis
+import ir.*
+import analysis.solvers.*
+
+trait ILValueAnalysisMisc:
+  val valuelattice: ConstantPropagationLattice = ConstantPropagationLattice()
+  val statelattice: MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice] = MapLattice(valuelattice)
+
+  def eval(exp: Expr, env: statelattice.Element): valuelattice.Element =
+    import valuelattice._
+    exp match
+      case id: Variable   => env(id)
+      case n: BitVecLiteral     => bv(n)
+      case ze: ZeroExtend => zero_extend(ze.extension, eval(ze.body, env))
+      case se: SignExtend => sign_extend(se.extension, eval(se.body, env))
+      case e: Extract     => extract(e.end, e.start, eval(e.body, env))
+      case bin: BinaryExpr =>
+        val left = eval(bin.arg1, env)
+        val right = eval(bin.arg2, env)
+        bin.op match
+          case BVADD  => bvadd(left, right)
+          case BVSUB  => bvsub(left, right)
+          case BVMUL  => bvmul(left, right)
+          case BVUDIV => bvudiv(left, right)
+          case BVSDIV => bvsdiv(left, right)
+          case BVSREM => bvsrem(left, right)
+          case BVUREM => bvurem(left, right)
+          case BVSMOD => bvsmod(left, right)
+          case BVAND  => bvand(left, right)
+          case BVOR   => bvor(left, right)
+          case BVXOR  => bvxor(left, right)
+          case BVNAND => bvnand(left, right)
+          case BVNOR  => bvnor(left, right)
+          case BVXNOR => bvxnor(left, right)
+          case BVSHL  => bvshl(left, right)
+          case BVLSHR => bvlshr(left, right)
+          case BVASHR => bvashr(left, right)
+          case BVCOMP => bvcomp(left, right)
+          case BVCONCAT => concat(left, right)
+
+          //case BVULE => bvule(left, right)
+          //case BVUGE => bvuge(left, right)
+          //case BVULT => bvult(left, right)
+          //case BVUGT => bvugt(left, right)
+
+          //case BVSLE => bvsle(left, right)
+          //case BVSGE => bvsge(left, right)
+          //case BVSLT => bvslt(left, right)
+          //case BVSGT => bvsgt(left, right)
+
+          //case BVCONCAT => concat(left, right)
+          //case BVNEQ    => bvneq(left, right)
+          //case BVEQ     => bveq(left, right)
+
+      case un: UnaryExpr =>
+        val arg = eval(un.arg, env)
+
+        un.op match
+          case BVNOT => bvnot(arg)
+          case BVNEG => bvneg(arg)
+
+      case _ => valuelattice.top
+
+  val calleePreservedRegisters = Set("R0", "R1", "R2", "R3", "R4", "R5", "R6", "R7", "R8", "R9", "R10", "R11")
+
+  /** Transfer function for state lattice elements.
+    */
+  def localTransfer(n: IntraProcIRCursor.Node, s: statelattice.Element): statelattice.Element =
+    n match
+      case la: LocalAssign =>
+        s + (la.lhs -> eval(la.rhs, s))
+      case c: Call => s ++ calleePreservedRegisters.filter(reg => s.keys.exists(_.name == reg)).map(n => Register(n, BitVecType(64)) -> statelattice.sublattice.top).toMap
+      case _ => s
+
+
+type IRNode = IntraProcIRCursor.Node
+
+object IRSimpleValueAnalysis:
+  class Solver(prog: Program) extends ILValueAnalysisMisc
+    with IntraProcDependencies
+    with Dependencies[IRNode]
+    with Analysis[Map[IRNode, Map[Variable, FlatElement[BitVecLiteral]]]]
+    //with SimplePushDownWorklistFixpointSolver[IRNode]
+    with SimplePushDownWorklistFixpointSolver[IRNode, Map[Variable, FlatElement[BitVecLiteral]], MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice]]
+    :
+      /* Worklist initial set */
+      //override val lattice: MapLattice[IRNode, statelattice.type] = MapLattice(statelattice)
+      override val lattice: MapLattice[IRNode, Map[Variable, FlatElement[BitVecLiteral]], MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice]] = MapLattice(statelattice)
+
+      override val domain : Set[IRNode] = computeDomain(prog).toSet
+      def transfer(n: IRNode, s: statelattice.Element): statelattice.Element = localTransfer(n, s)
diff --git a/src/main/scala/analysis/Cfg.scala b/src/main/scala/analysis/Cfg.scala
@@ -1,6 +1,7 @@
 package analysis
 
 import scala.collection.mutable
+import intrusiveList.IntrusiveList
 import ir.*
 import cfg_visualiser.{DotArrow, DotGraph, DotInlineArrow, DotInterArrow, DotIntraArrow, DotNode, DotRegularArrow}
 
@@ -426,7 +427,9 @@ class ProgramCfgFactory:
       cfg.addEdge(funcEntryNode, funcExitNode)
     } else {
       // Recurse through blocks
-      visitBlock(proc.blocks.head, funcEntryNode)
+      visitBlock(proc.entryBlock.get, funcEntryNode)
+      // If it has no entry-block we still visit the exit block because VSA analysis expects everything to have an Exit
+      visitBlock(proc.returnBlock, funcEntryNode)
     }
 
     /** Add a block to the CFG. A block in this case is a basic block, so it contains a list of consecutive statements
@@ -470,12 +473,10 @@ class ProgramCfgFactory:
         *   Statements in this block
         * @param prevNode
         *   Preceding block's end node (jump)
-        * @param cond
-        *   Condition on the jump from `prevNode` to the first statement of this block
         * @return
         *   The last statement's CFG node
         */
-      def visitStmts(stmts: ArrayBuffer[Statement], prevNode: CfgNode): CfgCommandNode = {
+      def visitStmts(stmts: Iterable[Statement], prevNode: CfgNode): CfgCommandNode = {
 
         val firstNode = CfgStatementNode(stmts.head, block, funcEntryNode)
         cfg.addEdge(prevNode, firstNode)
@@ -504,9 +505,6 @@ class ProgramCfgFactory:
         * @param prevNode
         *   Either the previous statement in the block, or the previous block's end node (in the case that this block
         *   contains no statements)
-        * @param cond
-        *   Jump from `prevNode` to this. `TrueLiteral` if `prevNode` is a statement, and any `Expr` if `prevNode` is a
-        *   jump.
         * @param solitary
         *   `True` if this block contains no statements, `False` otherwise
         */

diff --git a/src/main/scala/analysis/Dependencies.scala b/src/main/scala/analysis/Dependencies.scala
@@ -1,4 +1,5 @@
 package analysis
+import ir.IntraProcIRCursor
 
 /** Dependency methods for worklist-based analyses.
   */
@@ -21,11 +22,22 @@ trait Dependencies[N]:
   def indep(n: N): Set[N]
 
 trait InterproceduralForwardDependencies extends Dependencies[CfgNode] {
-  def outdep(n: CfgNode): Set[CfgNode] = n.succInter.toSet
-  def indep(n: CfgNode): Set[CfgNode] = n.predInter.toSet
+  override def outdep(n: CfgNode): Set[CfgNode] = n.succInter.toSet
+  override def indep(n: CfgNode): Set[CfgNode] = n.predInter.toSet
 }
 
 trait IntraproceduralForwardDependencies extends Dependencies[CfgNode] {
-  def outdep(n: CfgNode): Set[CfgNode] = n.succIntra.toSet
-  def indep(n: CfgNode): Set[CfgNode] = n.predIntra.toSet
-}
+  override def outdep(n: CfgNode): Set[CfgNode] = n.succIntra.toSet
+  override def indep(n: CfgNode): Set[CfgNode] = n.predIntra.toSet
+}
+
+
+trait IntraProcDependencies extends Dependencies[IntraProcIRCursor.Node]:
+  override def outdep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcDependencies.outdep(n)
+  override def indep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcDependencies.indep(n)
+
+/** Dependency methods for forward analyses.
+  */
+object IntraProcDependencies extends Dependencies[IntraProcIRCursor.Node]:
+  override def outdep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcIRCursor.succ(n)
+  override def indep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcIRCursor.pred(n)
diff --git a/src/main/scala/analysis/UtilMethods.scala b/src/main/scala/analysis/UtilMethods.scala
@@ -25,7 +25,11 @@ def evaluateExpression(exp: Expr, constantPropResult: Map[Variable, FlatElement[
             case BVSUB => Some(BitVectorEval.smt_bvsub(l, r))
             case BVASHR => Some(BitVectorEval.smt_bvashr(l, r))
             case BVCOMP => Some(BitVectorEval.smt_bvcomp(l, r))
-            case _ => throw new RuntimeException("Binary operation support not implemented: " + binOp.op)
+            case x => {
+              Logger.error("Binary operation support not implemented: " + binOp.op)
+              None
+            }
+
           }
         case _ => None
       }

diff --git a/src/main/scala/analysis/solvers/FixPointSolver.scala b/src/main/scala/analysis/solvers/FixPointSolver.scala
@@ -91,7 +91,7 @@ trait ListSetWorklist[N] extends Worklist[N]:
   def add(n: N): Unit =
     worklist += n
 
-  def add(ns: Set[N]): Unit = worklist ++= ns
+  def add(ns: Iterable[N]): Unit = worklist ++= ns
 
   def run(first: Set[N]): Unit =
     worklist = new ListSet[N] ++ first
@@ -191,7 +191,6 @@ trait PushDownWorklistFixpointSolver[N, T, L <: Lattice[T]] extends MapLatticeSo
   }
 
   def process(n: N): Unit =
-    //val y = funsub(n, x, intra)
     val xn = x(n)
     val y = transfer(n, xn)
 

diff --git a/src/main/scala/cfg_visualiser/DotTools.scala b/src/main/scala/cfg_visualiser/DotTools.scala
@@ -12,6 +12,7 @@ object IDGenerator {
 }
 
 def wrap(input: String, width: Integer = 20): String =
+  return input
   if (input.length() <= width) {
     input
   } else {