Skip to content

Commit

Permalink
Merge pull request #141 from UQ-PAC/il-cfg-iterator
Browse files Browse the repository at this point in the history
Implement a simple way of getting the parent, successor and predecessor of any program, block, and command in the IL. 
Adds a TIP-style `Dependencies` trait that uses this instead of the CFG, this works well for intraprocedural analyses but interprocedural iteration is not correct.
  • Loading branch information
ailrst authored Jan 24, 2024
2 parents 03e54a8 + 2b088f3 commit 7ad7a20
Show file tree
Hide file tree
Showing 373 changed files with 3,011 additions and 348 deletions.
146 changes: 146 additions & 0 deletions docs/il-cfg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
CFG Iterator Implementation
===========================

This file explains the in-place CFG representation on top of the IL.

Motivations
-----------

We want a unified IL and CFG representation to avoid the problem of keeping two datastructures in sync,
and to essentially avoid the problem of defining the correspondence between the static analysis state domain, and
the IL in order to apply a transformation to the IL using the CFG results.

It also reduces the number of places refactors need to be applied, and reduces memory overhead for static analyses
(hopefully).


Interpreting the CFG from the IL
--------------------------------

The IL has two structural interpretations:

1. Its syntax tree; expressions have sub expressions and so on.
- This can be traversed using Visitors
- It can also be traversed down by accessing class fields, and upward using the Parent trait
- The traversal order is defined by the order of terms in the language with a depth-first traversal of sub-terms.
2. Its control flow graph; this is part of the language's semantics, and is inferred from the Jump and Call statements.
- This is traversed using the control flow iterator, or by constructing the separate Tip-style CFG and traversing that.
From here on we describe the 'control-flow iterator'.
- The traversal order is defined by the `Dependency` structure and `Worklist` solvers and the predecessor/successor
relation between pairs of nodes

We need to derive the predecessor/successor relation on CFG nodes IL .

1. CFG positions are defined as
- The entry to a procedure
- The single return point from a procedure
- The block and jump statement that return from the procedure
- The beginning of a block within a procedure
- A statement command within a block
- A jump or call command within a block

For example we define the language as statements for horn clauses. (`A :- B` means B produces A, with `,` indicating
conjunction and `;` indicating disjunction)

First we have basic blocks belonging to a procedure.

Procedure(id)
Block(id, procedure)
EntryBlock(block_id, procedure)
ReturnBlock(block_id, procedure)
Block(id, procedure) :- EntryBlock(id, procedure); ReturnBlock(id, procedure)

A list of sequential statements belonging to a block

Statement(id, block, index)

A list of jumps (either Calls or GoTos) belonging to a block, which occur after the statements. GoTos form the
intra-procedural edges, and Calls form the inter-procedural edges.

GoTo(id, block, destinationBlock) // multiple destinations
Call(id, block, destinationProcedure, returnBlock), count {Call(id, block, _, _)} == 1
Jump(id, block) :- GoTo(id, block, _) ; Call(id, block, _, _)

Statements and Jumps are both considered commands. All IL terms, commands, blocks, and procedures, have a unique
identifier. All of the above are considered IL terms.

Command(id) :- Statement(id, _, _) ; Jump(id, _)
ILTerm(id) :- Procedure(id); Block(id, _); Command(id)

The predecessor/successor relates ILTerms to ILTerms, and is simply defined in terms of the nodes

pred(i, j) :- succ(j, i)

succ(block, statement) :- Statement(statement, block, 0)
succ(statement1, statement2) :- Statement(statement1, block, i), Statement(statement2, block, i + 1)
succ(statement, goto) :- Statement(block, _last), Jump(block, goto), _last = max i forall Statement(block, i)

succ(goto, targetBlock) :- GoTo(goto, _, _, targetBlock)

succ(call, return_block) :- Call(call, block, dest_procedure, return_block)

For an inter-procedural CFG we also have:

succ(call, return_block) :- ReturnBlock(return_block, call), Procedure(call)
succ(call, targetProcedure) :- Call(call, _, _, targetProcedure)

An inter-procedural solver is expected to keep track of call sites which return statements jump back to.

So a sequential application of `succ` might look like

ProcedureA -> {Block0} -> {Statement1} -> {Statement2} -> {Jump0, Jump1} -> {Block1} | {Block2} -> ...

Implementation
--------------

We want it to be possible to define `succ(term, _)` and `pred(term, _)` for any given term in the IL in `O(1)`.
Successors are easily derived but predecessors are not stored with their successors. Furthermore `ProcedureExit`,
and `CallReturn` are not inherently present in the IL.

In code we have a set of Calls, and Gotos present in the IL: these define the edges from themselves to their target.

Then all vertices in the CFG---that is all Commands, Blocks, and Procedures in the IL---store a list of references to
their set of incoming and outgoing edges. In a sense the 'id's in the formulation above become the JVM object IDs.

For Blocks and Procedures this means a `Set` of call statements. For Commands this means they are
stored in their block in an intrusive linked list.

Specifically this means we store

Command:
- reference to parent block
- procedure to find the next or previous statement in the block
- IntrusiveListElement trait inserts a next() and previous() method forming the linked list

Block
- reference to parent procedure
- list of incoming GoTos
- list of Jumps including
- Outgoing Calls
- Outgoing GoTos
Procedure
- list of incoming Calls
- subroutine to compute the set of all outgoing calls in all contained blocks

This means the IL contains:
- Forward graph edges in the forms of calls and gotos
- Forward syntax tree edges in the form of classes containing their children as fields
- Backwards graph edges in the form of lists of incoming jumps and calls
- Procedure has list of incoming calls
- Block has list of incoming gotos
- Backwards syntax tree edges in the form of a parent field
- Implementation of the `HasParent` trait.

To maintain the backwards edges it is necessary to make the actual data structures private, and only allow
modification through interfaces which maintain the graph/tree.

Jumps:
- Must implement an interface to allow adding or removing edge references (references to themself) to and from their
target

Blocks and Procedures:
- Implement an interface for adding and removing edge references

Furthermore;
- Reparenting Blocks and Commands in the IL must preserve the parent field, this is not really implemented yet
2 changes: 1 addition & 1 deletion src/main/scala/analysis/Analysis.scala
Original file line number Diff line number Diff line change
Expand Up @@ -297,4 +297,4 @@ class MemoryRegionAnalysisSolver(
case _ => super.funsub(n, x)
}
}
}
}
76 changes: 76 additions & 0 deletions src/main/scala/analysis/BasicIRConstProp.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
package analysis
import ir.*
import analysis.solvers.*

trait ILValueAnalysisMisc:
val valuelattice: ConstantPropagationLattice = ConstantPropagationLattice()
val statelattice: MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice] = MapLattice(valuelattice)

def eval(exp: Expr, env: Map[Variable, FlatElement[BitVecLiteral]]): FlatElement[BitVecLiteral] =
import valuelattice._
exp match
case id: Variable => env(id)
case n: BitVecLiteral => bv(n)
case ze: ZeroExtend => zero_extend(ze.extension, eval(ze.body, env))
case se: SignExtend => sign_extend(se.extension, eval(se.body, env))
case e: Extract => extract(e.end, e.start, eval(e.body, env))
case bin: BinaryExpr =>
val left = eval(bin.arg1, env)
val right = eval(bin.arg2, env)
bin.op match
case BVADD => bvadd(left, right)
case BVSUB => bvsub(left, right)
case BVMUL => bvmul(left, right)
case BVUDIV => bvudiv(left, right)
case BVSDIV => bvsdiv(left, right)
case BVSREM => bvsrem(left, right)
case BVUREM => bvurem(left, right)
case BVSMOD => bvsmod(left, right)
case BVAND => bvand(left, right)
case BVOR => bvor(left, right)
case BVXOR => bvxor(left, right)
case BVNAND => bvnand(left, right)
case BVNOR => bvnor(left, right)
case BVXNOR => bvxnor(left, right)
case BVSHL => bvshl(left, right)
case BVLSHR => bvlshr(left, right)
case BVASHR => bvashr(left, right)
case BVCOMP => bvcomp(left, right)
case BVCONCAT => concat(left, right)

case un: UnaryExpr =>
val arg = eval(un.arg, env)

un.op match
case BVNOT => bvnot(arg)
case BVNEG => bvneg(arg)

case _ => valuelattice.top

private final val callerPreservedRegisters = Set("R0", "R1", "R2", "R3", "R4", "R5", "R6", "R7", "R8", "R9", "R10",
"R11", "R12", "R13", "R14", "R15", "R16", "R17", "R18", "R30")

/** Transfer function for state lattice elements.
*/
def localTransfer(n: CFGPosition, s: statelattice.Element): statelattice.Element =
n match
case la: LocalAssign =>
s + (la.lhs -> eval(la.rhs, s))
case c: Call => s ++ callerPreservedRegisters.filter(reg => s.keys.exists(_.name == reg)).map(n => Register(n, BitVecType(64)) -> statelattice.sublattice.top).toMap
case _ => s



object IRSimpleValueAnalysis:

class Solver(prog: Program) extends ILValueAnalysisMisc
with IRIntraproceduralForwardDependencies
with Analysis[Map[CFGPosition, Map[Variable, FlatElement[BitVecLiteral]]]]
with SimplePushDownWorklistFixpointSolver[CFGPosition, Map[Variable, FlatElement[BitVecLiteral]], MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice]]
:
/* Worklist initial set */
//override val lattice: MapLattice[CFGPosition, statelattice.type] = MapLattice(statelattice)
override val lattice: MapLattice[CFGPosition, Map[Variable, FlatElement[BitVecLiteral]], MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice]] = MapLattice(statelattice)

override val domain : Set[CFGPosition] = computeDomain(IntraProcIRCursor, prog.procedures).toSet
def transfer(n: CFGPosition, s: statelattice.Element): statelattice.Element = localTransfer(n, s)
10 changes: 2 additions & 8 deletions src/main/scala/analysis/Cfg.scala
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ class ProgramCfgFactory:
cfg.addEdge(funcEntryNode, funcExitNode)
} else {
// Recurse through blocks
visitBlock(proc.blocks.head, funcEntryNode)
visitBlock(proc.entryBlock.get, funcEntryNode)
}

/** Add a block to the CFG. A block in this case is a basic block, so it contains a list of consecutive statements
Expand Down Expand Up @@ -472,12 +472,10 @@ class ProgramCfgFactory:
* Statements in this block
* @param prevNode
* Preceding block's end node (jump)
* @param cond
* Condition on the jump from `prevNode` to the first statement of this block
* @return
* The last statement's CFG node
*/
def visitStmts(stmts: ArrayBuffer[Statement], prevNode: CfgNode): CfgCommandNode = {
def visitStmts(stmts: Iterable[Statement], prevNode: CfgNode): CfgCommandNode = {

val firstNode = CfgStatementNode(stmts.head, block, funcEntryNode)
cfg.addEdge(prevNode, firstNode)
Expand Down Expand Up @@ -506,9 +504,6 @@ class ProgramCfgFactory:
* @param prevNode
* Either the previous statement in the block, or the previous block's end node (in the case that this block
* contains no statements)
* @param cond
* Jump from `prevNode` to this. `TrueLiteral` if `prevNode` is a statement, and any `Expr` if `prevNode` is a
* jump.
* @param solitary
* `True` if this block contains no statements, `False` otherwise
*/
Expand Down Expand Up @@ -616,7 +611,6 @@ class ProgramCfgFactory:
cfg.addEdge(jmpNode, noReturn)
cfg.addEdge(noReturn, funcExitNode)
}
case _ => assert(false, s"unexpected jump encountered, jump: $jmp")
} // `jmps.head` match
} // `visitJumps` function
} // `visitBlocks` function
Expand Down
32 changes: 27 additions & 5 deletions src/main/scala/analysis/Dependencies.scala
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
package analysis
import ir.{IRWalk, IntraProcIRCursor, InterProcIRCursor, CFGPosition}

/** Dependency methods for worklist-based analyses.
*/
Expand All @@ -21,11 +22,32 @@ trait Dependencies[N]:
def indep(n: N): Set[N]

trait InterproceduralForwardDependencies extends Dependencies[CfgNode] {
def outdep(n: CfgNode): Set[CfgNode] = n.succInter.toSet
def indep(n: CfgNode): Set[CfgNode] = n.predInter.toSet
override def outdep(n: CfgNode): Set[CfgNode] = n.succInter.toSet
override def indep(n: CfgNode): Set[CfgNode] = n.predInter.toSet
}

trait IntraproceduralForwardDependencies extends Dependencies[CfgNode] {
def outdep(n: CfgNode): Set[CfgNode] = n.succIntra.toSet
def indep(n: CfgNode): Set[CfgNode] = n.predIntra.toSet
}
override def outdep(n: CfgNode): Set[CfgNode] = n.succIntra.toSet
override def indep(n: CfgNode): Set[CfgNode] = n.predIntra.toSet
}


trait IRInterproceduralForwardDependencies extends Dependencies[CFGPosition] {
override def outdep(n: CFGPosition): Set[CFGPosition] = InterProcIRCursor.succ(n)
override def indep(n: CFGPosition): Set[CFGPosition] = InterProcIRCursor.pred(n)
}

trait IRIntraproceduralForwardDependencies extends Dependencies[CFGPosition] {
override def outdep(n: CFGPosition): Set[CFGPosition] = IntraProcIRCursor.succ(n)
override def indep(n: CFGPosition): Set[CFGPosition] = IntraProcIRCursor.pred(n)
}

trait IRInterproceduralBackwardDependencies extends IRInterproceduralForwardDependencies {
override def outdep(n: CFGPosition): Set[CFGPosition] = super.indep(n)
override def indep(n: CFGPosition): Set[CFGPosition] = super.outdep(n)
}

trait IRIntraproceduralBackwardDependencies extends IRIntraproceduralForwardDependencies {
override def outdep(n: CFGPosition): Set[CFGPosition] = super.indep(n)
override def indep(n: CFGPosition): Set[CFGPosition] = super.outdep(n)
}
5 changes: 4 additions & 1 deletion src/main/scala/analysis/UtilMethods.scala
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@ def evaluateExpression(exp: Expr, constantPropResult: Map[Variable, FlatElement[
case BVASHR => Some(BitVectorEval.smt_bvashr(l, r))
case BVCOMP => Some(BitVectorEval.smt_bvcomp(l, r))
case BVCONCAT => Some(BitVectorEval.smt_concat(l, r))
case _ => throw new RuntimeException("Binary operation support not implemented: " + binOp.op)
case x => {
Logger.error("Binary operation support not implemented: " + binOp.op)
None
}
}
case _ => None
}
Expand Down
2 changes: 1 addition & 1 deletion src/main/scala/analysis/solvers/FixPointSolver.scala
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ trait ListSetWorklist[N] extends Worklist[N]:
def add(n: N): Unit =
worklist += n

def add(ns: Set[N]): Unit = worklist ++= ns
def add(ns: Iterable[N]): Unit = worklist ++= ns

def run(first: Set[N]): Unit =
worklist = new ListSet[N] ++ first
Expand Down
Loading

0 comments on commit 7ad7a20

Please sign in to comment.