Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Il cfg iterator #141

Merged
merged 46 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
6e68244
move il classes to separate files
ailrst Nov 7, 2023
0acf9b7
move il classes to separate files
ailrst Nov 7, 2023
c3f63b6
add parent references to il
ailrst Nov 7, 2023
ff30667
initial intrusive list work
ailrst Nov 7, 2023
aeba762
refactor il to use intrusive list and fix intrusive list
ailrst Nov 8, 2023
c270d15
initial work to add backwards intraproc links to IL
ailrst Nov 9, 2023
118ecff
add assertions
ailrst Nov 9, 2023
2e4f9e8
fix jump order issue
ailrst Nov 9, 2023
d6dafeb
add procedure called-by links
ailrst Nov 9, 2023
3a2d6bd
move intra to trait param on dependencies
ailrst Nov 10, 2023
a3b2c88
implement prototype IL constprop
ailrst Nov 10, 2023
cd355dd
cleanup
ailrst Nov 10, 2023
51ed576
explanation
ailrst Nov 10, 2023
1ee3c3f
minor edit
ailrst Nov 10, 2023
b7d9a47
edit
ailrst Nov 10, 2023
a6af61e
output il cfg
ailrst Nov 13, 2023
0dee8b9
undo split up block and procedure
ailrst Nov 21, 2023
688c2d6
merge boogie style control flow
ailrst Nov 27, 2023
c57b73a
update expected
ailrst Nov 27, 2023
b2eb839
cleanup
ailrst Nov 27, 2023
c5c6ed4
add procedure return block and distinct entry block
ailrst Nov 28, 2023
ddb0269
handle call in constprop
ailrst Nov 28, 2023
465a49e
Merge branch 'main' into il-cfg-iterator
ailrst Nov 28, 2023
c04a1f1
fix broken tests & cleanup
ailrst Nov 28, 2023
678e3d3
udpate expected
ailrst Nov 28, 2023
0fe3041
cleanup
ailrst Nov 28, 2023
0fd434d
cleanup output
ailrst Nov 29, 2023
34b6cf4
cleanup
ailrst Nov 29, 2023
8693a29
simple review fixes
ailrst Dec 1, 2023
a816ecb
cleanup entry/exit blocks somewhat
ailrst Dec 1, 2023
f628b00
fix
ailrst Dec 1, 2023
72f3bc8
proc and block graphs
ailrst Dec 1, 2023
ba65e4d
stuff
ailrst Jan 3, 2024
a94fd52
simpler analysis result printer
ailrst Jan 8, 2024
353e1ba
fix result printer and block
ailrst Jan 10, 2024
c9cd919
merge main
ailrst Jan 23, 2024
02aeae6
simplify Dependencies
ailrst Jan 23, 2024
b031e9c
update expected
ailrst Jan 23, 2024
68362ae
cleanup intrusivelist typecasts
ailrst Jan 23, 2024
9ccb025
format
ailrst Jan 23, 2024
45688e6
cleanup
ailrst Jan 23, 2024
4158d49
fix explanation
ailrst Jan 23, 2024
1f78b30
fix clearblocks
ailrst Jan 23, 2024
2851650
Merge branch 'main' into il-cfg-iterator
ailrst Jan 23, 2024
fa65d14
cleanup comments
ailrst Jan 24, 2024
2b088f3
update expected
ailrst Jan 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions docs/il-cfg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
CFG Iterator Implementation
===========================

This file explains the in-place CFG representation on top of the IL.

Motivations
-----------

We want a unified IL and CFG representation to avoid the problem of keeping two datastructures in sync,
and to essentially avoid the problem of defining the correspondence between the static analysis state domain, and
the IL in order to apply a transformation to the IL using the CFG results.

It also reduces the number of places refactors need to be applied, and reduces memory overhead for static analyses
(hopefully).


Interpreting the CFG from the IL
--------------------------------

The IL has two structural interpretations:

1. Its syntax tree; expressions have sub expressions and so on.
- This can be traversed using Visitors
- The traversal order is defined by the order of terms in the language with a depth-first traversal of sub-terms.
2. Its control flow graph; this is part of the language's semantics, and is inferred from the Jump and Call statements.
- This is traversed using the control flow iterator, or by constructing the separate Tip-style CFG and traversing that.
From here on we describe the 'control-flow iterator'.
- The traversal order is defined by the `Dependency` structure and `Worklist` solvers and the predecessor/successor
relation between pairs of nodes

We need to derive the predecessor/successor relation on CFG nodes IL .

1. CFG positions are defined as
- The entry to a procedure
- An exit from a procedure
- The beginning of a block within a procedure
- A statement command within a block
- A jump or call command within a block

For example we define the language as statements for horn clauses. (`A :- B` means B produces A, with `,` indicating
conjunction and `;` indicating disjunction)

First we have basic blocks belonging to a procedure.

Procedure(id)
Block(id, procedure)

A list of sequential statements belonging to a block

Statement(id, block, index)

A list of jumps (either Calls or GoTos) belonging to a block, which occur after the statements. GoTos form the
intra-procedural edges, and Calls form the inter-procedural edges.

GoTo(id, block, index, destinationBlock)
Call(id, block, index, destinationProcedure)
Jump(id, block) :- GoTo(id, block, _) ; Call(id, block, _)

Statements and Jumps are both considered commands. All IL terms, commands, blocks, and procedures, have a unique
identifier. All of the above are considered IL terms.

Command(id) :- Statement(id, _, _) ; Jump(id, _)
ILTerm(id) :- Procedure(id); Block(id, _); Command(id)

The CFG extends this language with the following nodes:

ProcedureExit(id, fromProcedure, fromJump)
CallReturn(id, fromCall)

CFGNode(id) :- ProcedureExit(id,_,_) ; CallReturn(id,_) ; ILTerm(id)

The predecessor/successor relates CFGNodes to CFGNodes, and is simply defined in terms of the nodes

pred(i, j) :- succ(j, i)

succ(block, statement) :- Statement(statement, block, 0)
succ(statement1, statement2) :- Statement(statement1, block, i), Statement(statement2, block, i + 1)
succ(statement, goto) :- Statement(block, _last), Jump(block, goto), _last = max i forall Statement(block, i)

succ(goto, targetBlock) :- GoTo(goto, _, _, targetBlock)

// We always insert nodes for calls to return to
CallReturn(i, call) :- Call(call, _, _, _)
succ(call, callreturn) :- CallReturn(callreturn, call), Procedure(call)

// a 'return' from the procedure is an indirect call to register R30
succ(call, exit) :- Call(call, block, _, "R30"), ProcedureExit(exit, procedure, call), Block(block, procedure)

For an inter-procedural CFG we also have:

succ(call, targetProcedure) :- Call(call, _, _, targetProcedure)
succ(exit, returnNode) :- ProcedureExit(exit, procedure, call), CallReturn(returnNode, call)

So a sequential application of `succ` might look like

ProcedureA -> {Block0} -> {Statement1} -> {Statement2} -> {Jump0, Jump1} -> {Block1} | {Block2} -> ...

Implementation
--------------

We want it to be possible to define `succ(term, _)` and `pred(term, _)` for any given term in the IL in `O(1)`.
Successors are easily derived but predecessors are not stored with their successors. Furthermore `ProcedureExit`,
and `CallReturn` are not inherently present in the IL.

In code we have a set of Calls, and Gotos present in the IL: these define the edges from themselves to their target.

Then all vertices in the CFG---that is all Commands, Blocks, and Procedures in the IL---store a list of references to
their set of incoming and outgoing edges. In a sense the 'id's in the formulation above become the JVM object IDs.

For Blocks and Procedures this means a `Set` of call statements. For Commands this means they are
stored in their block in an intrusive linked list.

Specifically this means we store

Command:
- reference to parent block
- procedure to find the next or previous statement in the block
- IntrusiveListElement trait inserts a next() and previous() method forming the linked list

Block
- reference to parent procedure
- list of incoming GoTos
- list of Jumps including
- Outgoing Calls
- Outgoing GoTos

Procedure
- list of incoming Calls
- subroutine to compute the set of all outgoing calls in all contained blocks

To maintain this superimposed graph it is necessary to make the actual call lists private, and only allow
modification of through interfaces which maintain the graph.

Maintenance of the graph is the responsibility of the Block class: adding or removing jumps must ensure the edge
references are maintained.

Jumps:
- Must implement an interface to allow adding or removing edge references (references to themself) to and from their
target

Blocks and Procedures:
- Implement an interface for adding and removing edge references

Furthermore;
- Reparenting Blocks and Commands in the IL must preserve the parent field, this is not really implemented yet
6 changes: 3 additions & 3 deletions src/main/scala/analysis/Analysis.scala
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ trait Analysis[+R]:

/** Performs the analysis and returns the result.
*/
def analyze(intra: Boolean): R
def analyze(): R

/** A flow-sensitive analysis.
* @param stateAfterNode
Expand Down Expand Up @@ -128,9 +128,9 @@ abstract class ValueAnalysisWorklistSolver[L <: LatticeWithOps](
) extends SimpleValueAnalysis(cfg)
with SimplePushDownWorklistFixpointSolver[CfgNode]
with ForwardDependencies
with Dependencies[CfgNode](true)

object ConstantPropagationAnalysis:

class WorklistSolver(cfg: ProgramCfg) extends ValueAnalysisWorklistSolver(cfg, ConstantPropagationLattice)


Expand Down Expand Up @@ -406,9 +406,9 @@ abstract class IntraprocMemoryRegionAnalysisWorklistSolver[L <: PowersetLattice[
) extends MemoryRegionAnalysis(cfg, globals, globalOffsets, subroutines, constantProp)
with SimpleMonotonicSolver[CfgNode]
with ForwardDependencies
with Dependencies[CfgNode](true)

object MemoryRegionAnalysis:

class WorklistSolver(
cfg: ProgramCfg,
globals: Map[BigInt, String],
Expand Down
87 changes: 87 additions & 0 deletions src/main/scala/analysis/BasicIRConstProp.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
package analysis
import ir.*

import analysis.solvers.*

trait ILValueAnalysisMisc:
val valuelattice: LatticeWithOps
val statelattice: MapLattice[Variable, valuelattice.type] = new MapLattice(valuelattice)

def eval(exp: Expr, env: statelattice.Element): valuelattice.Element =
import valuelattice._
exp match
case id: Variable => env(id)
case n: Literal => literal(n)
case ze: ZeroExtend => zero_extend(ze.extension, eval(ze.body, env))
case se: SignExtend => sign_extend(se.extension, eval(se.body, env))
case e: Extract => extract(e.end, e.start, eval(e.body, env))
case bin: BinaryExpr =>
val left = eval(bin.arg1, env)
val right = eval(bin.arg2, env)
bin.op match
case BVADD => bvadd(left, right)
case BVSUB => bvsub(left, right)
case BVMUL => bvmul(left, right)
case BVUDIV => bvudiv(left, right)
case BVSDIV => bvsdiv(left, right)
case BVSREM => bvsrem(left, right)
case BVUREM => bvurem(left, right)
case BVSMOD => bvsmod(left, right)
case BVAND => bvand(left, right)
case BVOR => bvor(left, right)
case BVXOR => bvxor(left, right)
case BVNAND => bvnand(left, right)
case BVNOR => bvnor(left, right)
case BVXNOR => bvxnor(left, right)
case BVSHL => bvshl(left, right)
case BVLSHR => bvlshr(left, right)
case BVASHR => bvashr(left, right)
case BVCOMP => bvcomp(left, right)

case BVULE => bvule(left, right)
case BVUGE => bvuge(left, right)
case BVULT => bvult(left, right)
case BVUGT => bvugt(left, right)

case BVSLE => bvsle(left, right)
case BVSGE => bvsge(left, right)
case BVSLT => bvslt(left, right)
case BVSGT => bvsgt(left, right)

case BVCONCAT => concat(left, right)
case BVNEQ => bvneq(left, right)
case BVEQ => bveq(left, right)

case un: UnaryExpr =>
val arg = eval(un.arg, env)

un.op match
case BVNOT => bvnot(arg)
case BVNEG => bvneg(arg)

case _ => valuelattice.top


/** Transfer function for state lattice elements.
*/
def localTransfer(n: IntraProcIRCursor.Node, s: statelattice.Element): statelattice.Element =
n match
case la: LocalAssign =>
s + (la.lhs -> eval(la.rhs, s))
case _ => s


type IRNode = IntraProcIRCursor.Node

object IRSimpleValueAnalysis:
class Solver[+L <: LatticeWithOps](prog: Program, val valuelattice: L) extends FlowSensitiveAnalysis(true)
with IntraProcDependencies
with Dependencies[IRNode](true)
with ILValueAnalysisMisc
with SimplePushDownWorklistFixpointSolver[IRNode]
:
/* Worklist initial set */
override val lattice: MapLattice[IRNode, statelattice.type] = MapLattice(statelattice)

override val domain : Set[IRNode] = computeDomain(prog).toSet
def transfer(n: IRNode, s: statelattice.Element): statelattice.Element = localTransfer(n, s)
15 changes: 7 additions & 8 deletions src/main/scala/analysis/Cfg.scala
Original file line number Diff line number Diff line change
Expand Up @@ -688,7 +688,7 @@ class ProgramCfgFactory:
block.statements.size match {
case i if i > 0 =>
// Block contains some statements
val endStmt: CfgCommandNode = visitStmts(block.statements, prevBlockEnd, cond)
val endStmt: CfgCommandNode = visitStmts(ArrayBuffer.from(block.statements), prevBlockEnd, cond)
visitJumps(block.jumps, endStmt, TrueLiteral, solitary = false)
case _ =>
// Only jumps in this block
Expand Down Expand Up @@ -744,7 +744,7 @@ class ProgramCfgFactory:
* @param solitary
* `True` if this block contains no statements, `False` otherwise
*/
def visitJumps(jmps: ArrayBuffer[Jump], prevNode: CfgNode, cond: Expr, solitary: Boolean): Unit = {
def visitJumps(jmps: Iterable[Jump], prevNode: CfgNode, cond: Expr, solitary: Boolean): Unit = {

val jmpNode: CfgJumpNode = CfgJumpNode(data = jmps.head, block = block, parent = funcEntryNode)
var precNode: CfgNode = prevNode
Expand All @@ -759,9 +759,9 @@ class ProgramCfgFactory:
Currently we display these nodes in the DOT view of the CFG, however these could be hidden if desired.
*/
jmps.head match {
case jmp: GoTo =>
// `GoTo`s are just edges, so introduce a fake `start of block` that can be jmp'd to
val ghostNode = CfgGhostNode(block = block, parent = funcEntryNode, data = NOP(jmp.label))
case jmp: DetGoTo =>
// `DetGoTo`s are just edges, so introduce a fake `start of block` that can be jmp'd to
val ghostNode = CfgGhostNode(block = block, parent = funcEntryNode, data = NOP(jmp.label, block))
cfg.addEdge(prevNode, ghostNode, cond)
precNode = ghostNode
visitedBlocks += (block -> ghostNode)
Expand All @@ -774,7 +774,7 @@ class ProgramCfgFactory:
// TODO this is not a robust approach

jmps.head match {
case goto: GoTo =>
case goto: DetGoTo =>
// Process first jump
var targetBlock: Block = goto.target
var targetCond: Expr = goto.condition match {
Expand All @@ -793,7 +793,7 @@ class ProgramCfgFactory:
/* TODO it is not a safe assumption that there are a maximum of two jumps, or that a GoTo will follow a GoTo
*/
if (targetCond != TrueLiteral) {
val secondGoto: GoTo = jmps.tail.head.asInstanceOf[GoTo]
val secondGoto: DetGoTo = jmps.tail.head.asInstanceOf[DetGoTo]
targetBlock = secondGoto.target
// IR doesn't store negation of condition, so we must do it manually
targetCond = negateConditional(targetCond)
Expand Down Expand Up @@ -889,7 +889,6 @@ class ProgramCfgFactory:
cfg.addEdge(jmpNode, noReturn)
cfg.addEdge(noReturn, funcExitNode)
}
case _ => assert(false, s"unexpected jump encountered, jumps: $jmps")
} // `jmps.head` match
} // `visitJumps` function
} // `visitBlocks` function
Expand Down
23 changes: 18 additions & 5 deletions src/main/scala/analysis/Dependencies.scala
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
package analysis
import ir.IntraProcIRCursor

/** Dependency methods for worklist-based analyses.
*/
trait Dependencies[N]:
trait Dependencies[N](val intra: Boolean):

/** Outgoing dependencies. Used when propagating dataflow to successors.
* @param n
* an element from the worklist
* @return
* the elements that depend on the given element
*/
def outdep(n: N, intra: Boolean): Set[N]
def outdep(n: N): Set[N]

/** Incoming dependencies. Used when computing the join from predecessors.
* @param n
* an element from the worklist
* @return
* the elements that the given element depends on
*/
def indep(n: N, intra: Boolean): Set[N]
def indep(n: N): Set[N]

/** Dependency methods for forward analyses.
*/
Expand All @@ -27,8 +28,20 @@ trait ForwardDependencies extends Dependencies[CfgNode]:
/* TODO: add functionality here for distinguishing between Intra / Inter */

// Also add support for getting edges / conditions here?
def outdep(n: CfgNode, intra: Boolean = true): Set[CfgNode] =
override def outdep(n: CfgNode): Set[CfgNode] =
if intra then n.succ(intra).toSet else n.succ(intra).toSet.union(n.succ(!intra).toSet)

def indep(n: CfgNode, intra: Boolean = true): Set[CfgNode] =
override def indep(n: CfgNode): Set[CfgNode] =
if intra then n.pred(intra).toSet else n.pred(intra).toSet.union(n.pred(!intra).toSet)



trait IntraProcDependencies extends Dependencies[IntraProcIRCursor.Node]:
override def outdep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcDependencies.outdep(n)
override def indep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcDependencies.indep(n)

/** Dependency methods for forward analyses.
*/
object IntraProcDependencies extends Dependencies[IntraProcIRCursor.Node](true):
override def outdep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcIRCursor.succ(n)
override def indep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcIRCursor.pred(n)
2 changes: 1 addition & 1 deletion src/main/scala/analysis/SteensgaardAnalysis.scala
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class SteensgaardAnalysis(program: Program, constantPropResult: Map[CfgNode, Map

/** @inheritdoc
*/
def analyze(intra: Boolean): Unit =
def analyze(): Unit =
// generate the constraints by traversing the AST and solve them on-the-fly
visit(program, ())

Expand Down
6 changes: 5 additions & 1 deletion src/main/scala/analysis/UtilMethods.scala
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,11 @@ def evaluateExpression(exp: Expr, constantPropResult: Map[Variable, ConstantProp
case BVSUB => Some(BitVectorEval.smt_bvsub(l, r))
case BVASHR => Some(BitVectorEval.smt_bvashr(l, r))
case BVCOMP => Some(BitVectorEval.smt_bvcomp(l, r))
case _ => throw new RuntimeException("Binary operation support not implemented: " + binOp.op)
case x => {
Logger.error("Binary operation support not implemented: " + binOp.op)
None
}

}
case _ => None
}
Expand Down
Loading