-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Il cfg iterator #141
Il cfg iterator #141
Changes from 28 commits
6e68244
0acf9b7
c3f63b6
ff30667
aeba762
c270d15
118ecff
2e4f9e8
d6dafeb
3a2d6bd
a3b2c88
cd355dd
51ed576
1ee3c3f
b7d9a47
a6af61e
0dee8b9
688c2d6
c57b73a
b2eb839
c5c6ed4
ddb0269
465a49e
c04a1f1
678e3d3
0fe3041
0fd434d
34b6cf4
8693a29
a816ecb
f628b00
72f3bc8
ba65e4d
a94fd52
353e1ba
c9cd919
02aeae6
b031e9c
68362ae
9ccb025
45688e6
4158d49
1f78b30
2851650
fa65d14
2b088f3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
CFG Iterator Implementation | ||
=========================== | ||
|
||
This file explains the in-place CFG representation on top of the IL. | ||
|
||
Motivations | ||
----------- | ||
|
||
We want a unified IL and CFG representation to avoid the problem of keeping two datastructures in sync, | ||
and to essentially avoid the problem of defining the correspondence between the static analysis state domain, and | ||
the IL in order to apply a transformation to the IL using the CFG results. | ||
|
||
It also reduces the number of places refactors need to be applied, and reduces memory overhead for static analyses | ||
(hopefully). | ||
|
||
|
||
Interpreting the CFG from the IL | ||
-------------------------------- | ||
|
||
The IL has two structural interpretations: | ||
|
||
1. Its syntax tree; expressions have sub expressions and so on. | ||
- This can be traversed using Visitors | ||
- The traversal order is defined by the order of terms in the language with a depth-first traversal of sub-terms. | ||
2. Its control flow graph; this is part of the language's semantics, and is inferred from the Jump and Call statements. | ||
- This is traversed using the control flow iterator, or by constructing the separate Tip-style CFG and traversing that. | ||
From here on we describe the 'control-flow iterator'. | ||
- The traversal order is defined by the `Dependency` structure and `Worklist` solvers and the predecessor/successor | ||
relation between pairs of nodes | ||
|
||
We need to derive the predecessor/successor relation on CFG nodes IL . | ||
|
||
1. CFG positions are defined as | ||
- The entry to a procedure | ||
- The single return point from a procedure | ||
- The beginning of a block within a procedure | ||
- A statement command within a block | ||
- A jump or call command within a block | ||
|
||
For example we define the language as statements for horn clauses. (`A :- B` means B produces A, with `,` indicating | ||
conjunction and `;` indicating disjunction) | ||
|
||
First we have basic blocks belonging to a procedure. | ||
|
||
Procedure(id) | ||
Block(id, procedure) | ||
EntryBlock(block_id, procedure) | ||
ReturnBlock(block_id, procedure) | ||
Block(id, procedure) :- EntryBlock(id, procedure); ReturnBlock(id, procedure) | ||
|
||
A list of sequential statements belonging to a block | ||
|
||
Statement(id, block, index) | ||
|
||
A list of jumps (either Calls or GoTos) belonging to a block, which occur after the statements. GoTos form the | ||
intra-procedural edges, and Calls form the inter-procedural edges. | ||
|
||
GoTo(id, block, destinationBlock) // multiple destinations | ||
Call(id, block, destinationProcedure, returnBlock), count {Call(id, block, _, _)} == 1 | ||
Jump(id, block) :- GoTo(id, block, _) ; Call(id, block, _, _) | ||
|
||
Statements and Jumps are both considered commands. All IL terms, commands, blocks, and procedures, have a unique | ||
identifier. All of the above are considered IL terms. | ||
|
||
Command(id) :- Statement(id, _, _) ; Jump(id, _) | ||
ILTerm(id) :- Procedure(id); Block(id, _); Command(id) | ||
|
||
The predecessor/successor relates ILTerms to ILTerms, and is simply defined in terms of the nodes | ||
|
||
pred(i, j) :- succ(j, i) | ||
|
||
succ(block, statement) :- Statement(statement, block, 0) | ||
succ(statement1, statement2) :- Statement(statement1, block, i), Statement(statement2, block, i + 1) | ||
succ(statement, goto) :- Statement(block, _last), Jump(block, goto), _last = max i forall Statement(block, i) | ||
|
||
succ(goto, targetBlock) :- GoTo(goto, _, _, targetBlock) | ||
|
||
succ(call, return_block) :- Call(call, block, dest_procedure, return_block) | ||
|
||
For an inter-procedural CFG we also have: | ||
|
||
succ(call, return_block) :- ReturnBlock(return_block, call), Procedure(call) | ||
succ(call, targetProcedure) :- Call(call, _, _, targetProcedure) | ||
succ(exit, returnNode) :- ProcedureExit(exit, procedure, call), CallReturn(returnNode, call) | ||
|
||
So a sequential application of `succ` might look like | ||
|
||
ProcedureA -> {Block0} -> {Statement1} -> {Statement2} -> {Jump0, Jump1} -> {Block1} | {Block2} -> ... | ||
|
||
Implementation | ||
-------------- | ||
|
||
We want it to be possible to define `succ(term, _)` and `pred(term, _)` for any given term in the IL in `O(1)`. | ||
Successors are easily derived but predecessors are not stored with their successors. Furthermore `ProcedureExit`, | ||
and `CallReturn` are not inherently present in the IL. | ||
|
||
In code we have a set of Calls, and Gotos present in the IL: these define the edges from themselves to their target. | ||
|
||
Then all vertices in the CFG---that is all Commands, Blocks, and Procedures in the IL---store a list of references to | ||
their set of incoming and outgoing edges. In a sense the 'id's in the formulation above become the JVM object IDs. | ||
|
||
For Blocks and Procedures this means a `Set` of call statements. For Commands this means they are | ||
stored in their block in an intrusive linked list. | ||
|
||
Specifically this means we store | ||
|
||
Command: | ||
- reference to parent block | ||
- procedure to find the next or previous statement in the block | ||
- IntrusiveListElement trait inserts a next() and previous() method forming the linked list | ||
|
||
Block | ||
- reference to parent procedure | ||
- list of incoming GoTos | ||
- list of Jumps including | ||
- Outgoing Calls | ||
- Outgoing GoTos | ||
|
||
Procedure | ||
- list of incoming Calls | ||
- subroutine to compute the set of all outgoing calls in all contained blocks | ||
|
||
This means the IL contains: | ||
- Forward graph edges in the forms of calls and gotos | ||
- Forward syntax tree edges in the form of classes containing their children as fields | ||
- Backwards graph edges in the form of lists of incoming jumps and calls | ||
- Procedure has list of incoming calls | ||
- Block has list of incoming gotos | ||
- Backwards syntax tree edges in the form of a parent field | ||
- Implementation of the `HasParent` trait. | ||
|
||
To maintain the backwards edges it is necessary to make the actual data structures private, and only allow | ||
modification through interfaces which maintain the graph/tree. | ||
|
||
Jumps: | ||
- Must implement an interface to allow adding or removing edge references (references to themself) to and from their | ||
target | ||
|
||
Blocks and Procedures: | ||
- Implement an interface for adding and removing edge references | ||
|
||
Furthermore; | ||
- Reparenting Blocks and Commands in the IL must preserve the parent field, this is not really implemented yet |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
package analysis | ||
import ir.* | ||
import analysis.solvers.* | ||
|
||
trait ILValueAnalysisMisc: | ||
val valuelattice: ConstantPropagationLattice = ConstantPropagationLattice() | ||
val statelattice: MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice] = MapLattice(valuelattice) | ||
|
||
def eval(exp: Expr, env: statelattice.Element): valuelattice.Element = | ||
import valuelattice._ | ||
exp match | ||
case id: Variable => env(id) | ||
case n: BitVecLiteral => bv(n) | ||
case ze: ZeroExtend => zero_extend(ze.extension, eval(ze.body, env)) | ||
case se: SignExtend => sign_extend(se.extension, eval(se.body, env)) | ||
case e: Extract => extract(e.end, e.start, eval(e.body, env)) | ||
case bin: BinaryExpr => | ||
val left = eval(bin.arg1, env) | ||
val right = eval(bin.arg2, env) | ||
bin.op match | ||
case BVADD => bvadd(left, right) | ||
case BVSUB => bvsub(left, right) | ||
case BVMUL => bvmul(left, right) | ||
case BVUDIV => bvudiv(left, right) | ||
case BVSDIV => bvsdiv(left, right) | ||
case BVSREM => bvsrem(left, right) | ||
case BVUREM => bvurem(left, right) | ||
case BVSMOD => bvsmod(left, right) | ||
case BVAND => bvand(left, right) | ||
case BVOR => bvor(left, right) | ||
case BVXOR => bvxor(left, right) | ||
case BVNAND => bvnand(left, right) | ||
case BVNOR => bvnor(left, right) | ||
case BVXNOR => bvxnor(left, right) | ||
case BVSHL => bvshl(left, right) | ||
case BVLSHR => bvlshr(left, right) | ||
case BVASHR => bvashr(left, right) | ||
case BVCOMP => bvcomp(left, right) | ||
case BVCONCAT => concat(left, right) | ||
|
||
//case BVULE => bvule(left, right) | ||
//case BVUGE => bvuge(left, right) | ||
//case BVULT => bvult(left, right) | ||
//case BVUGT => bvugt(left, right) | ||
|
||
//case BVSLE => bvsle(left, right) | ||
//case BVSGE => bvsge(left, right) | ||
//case BVSLT => bvslt(left, right) | ||
//case BVSGT => bvsgt(left, right) | ||
|
||
//case BVCONCAT => concat(left, right) | ||
//case BVNEQ => bvneq(left, right) | ||
//case BVEQ => bveq(left, right) | ||
|
||
case un: UnaryExpr => | ||
val arg = eval(un.arg, env) | ||
|
||
un.op match | ||
case BVNOT => bvnot(arg) | ||
case BVNEG => bvneg(arg) | ||
|
||
case _ => valuelattice.top | ||
|
||
val calleePreservedRegisters = Set("R0", "R1", "R2", "R3", "R4", "R5", "R6", "R7", "R8", "R9", "R10", "R11") | ||
|
||
/** Transfer function for state lattice elements. | ||
*/ | ||
def localTransfer(n: IntraProcIRCursor.Node, s: statelattice.Element): statelattice.Element = | ||
n match | ||
case la: LocalAssign => | ||
s + (la.lhs -> eval(la.rhs, s)) | ||
case c: Call => s ++ calleePreservedRegisters.filter(reg => s.keys.exists(_.name == reg)).map(n => Register(n, BitVecType(64)) -> statelattice.sublattice.top).toMap | ||
case _ => s | ||
|
||
|
||
type IRNode = IntraProcIRCursor.Node | ||
|
||
object IRSimpleValueAnalysis: | ||
class Solver(prog: Program) extends ILValueAnalysisMisc | ||
with IntraProcDependencies | ||
with Dependencies[IRNode] | ||
with Analysis[Map[IRNode, Map[Variable, FlatElement[BitVecLiteral]]]] | ||
//with SimplePushDownWorklistFixpointSolver[IRNode] | ||
with SimplePushDownWorklistFixpointSolver[IRNode, Map[Variable, FlatElement[BitVecLiteral]], MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice]] | ||
: | ||
/* Worklist initial set */ | ||
//override val lattice: MapLattice[IRNode, statelattice.type] = MapLattice(statelattice) | ||
override val lattice: MapLattice[IRNode, Map[Variable, FlatElement[BitVecLiteral]], MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice]] = MapLattice(statelattice) | ||
|
||
override val domain : Set[IRNode] = computeDomain(prog).toSet | ||
def transfer(n: IRNode, s: statelattice.Element): statelattice.Element = localTransfer(n, s) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
package analysis | ||
|
||
import scala.collection.mutable | ||
import intrusiveList.IntrusiveList | ||
import ir.* | ||
import cfg_visualiser.{DotArrow, DotGraph, DotInlineArrow, DotInterArrow, DotIntraArrow, DotNode, DotRegularArrow} | ||
|
||
|
@@ -426,7 +427,9 @@ class ProgramCfgFactory: | |
cfg.addEdge(funcEntryNode, funcExitNode) | ||
} else { | ||
// Recurse through blocks | ||
visitBlock(proc.blocks.head, funcEntryNode) | ||
visitBlock(proc.entryBlock.get, funcEntryNode) | ||
// If it has no entry-block we still visit the exit block because VSA analysis expects everything to have an Exit | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comes back to a design issue with the MRA - the MRA as it currently exists just collects memory regions across the entire procedure, with the exit node being assumed to exist for each function and have the entire set of regions for the procedure associated with it at the end of the analysis. All nodes except the exit nodes are ultimately irrelevant to the analysis and there isn't really any good reason that the MRA in its current form is done per-statement instead of per-procedure. When the VSA expects something to have an exit but it doesn't, the actual issue usually is the inability of the analysis to handle loops. |
||
visitBlock(proc.returnBlock, funcEntryNode) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the procedure has no entry block, an exception will be thrown by entryBlock.get, so this isn't really doing what you want at all anyway. |
||
} | ||
|
||
/** Add a block to the CFG. A block in this case is a basic block, so it contains a list of consecutive statements | ||
|
@@ -470,12 +473,10 @@ class ProgramCfgFactory: | |
* Statements in this block | ||
* @param prevNode | ||
* Preceding block's end node (jump) | ||
* @param cond | ||
* Condition on the jump from `prevNode` to the first statement of this block | ||
* @return | ||
* The last statement's CFG node | ||
*/ | ||
def visitStmts(stmts: ArrayBuffer[Statement], prevNode: CfgNode): CfgCommandNode = { | ||
def visitStmts(stmts: Iterable[Statement], prevNode: CfgNode): CfgCommandNode = { | ||
|
||
val firstNode = CfgStatementNode(stmts.head, block, funcEntryNode) | ||
cfg.addEdge(prevNode, firstNode) | ||
|
@@ -504,9 +505,6 @@ class ProgramCfgFactory: | |
* @param prevNode | ||
* Either the previous statement in the block, or the previous block's end node (in the case that this block | ||
* contains no statements) | ||
* @param cond | ||
* Jump from `prevNode` to this. `TrueLiteral` if `prevNode` is a statement, and any `Expr` if `prevNode` is a | ||
* jump. | ||
* @param solitary | ||
* `True` if this block contains no statements, `False` otherwise | ||
*/ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
package analysis | ||
import ir.IntraProcIRCursor | ||
|
||
/** Dependency methods for worklist-based analyses. | ||
*/ | ||
|
@@ -21,11 +22,22 @@ trait Dependencies[N]: | |
def indep(n: N): Set[N] | ||
|
||
trait InterproceduralForwardDependencies extends Dependencies[CfgNode] { | ||
def outdep(n: CfgNode): Set[CfgNode] = n.succInter.toSet | ||
def indep(n: CfgNode): Set[CfgNode] = n.predInter.toSet | ||
override def outdep(n: CfgNode): Set[CfgNode] = n.succInter.toSet | ||
override def indep(n: CfgNode): Set[CfgNode] = n.predInter.toSet | ||
} | ||
|
||
trait IntraproceduralForwardDependencies extends Dependencies[CfgNode] { | ||
def outdep(n: CfgNode): Set[CfgNode] = n.succIntra.toSet | ||
def indep(n: CfgNode): Set[CfgNode] = n.predIntra.toSet | ||
} | ||
override def outdep(n: CfgNode): Set[CfgNode] = n.succIntra.toSet | ||
override def indep(n: CfgNode): Set[CfgNode] = n.predIntra.toSet | ||
} | ||
|
||
|
||
trait IntraProcDependencies extends Dependencies[IntraProcIRCursor.Node]: | ||
override def outdep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcDependencies.outdep(n) | ||
override def indep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcDependencies.indep(n) | ||
|
||
/** Dependency methods for forward analyses. | ||
*/ | ||
object IntraProcDependencies extends Dependencies[IntraProcIRCursor.Node]: | ||
override def outdep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcIRCursor.succ(n) | ||
override def indep(n: IntraProcIRCursor.Node): Set[IntraProcIRCursor.Node] = IntraProcIRCursor.pred(n) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there really any point to doing this indirectly with an object? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not quite sure what you mean, but Dependencies is just to allow swapping succ and pred when we want to consider backwards dependencies and decouple the implementation from TIP's expected interface There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The IntraProcDependencies trait calls the methods from the IntraProcDependencies object - why does the object need to exist at all? |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ object IDGenerator { | |
} | ||
|
||
def wrap(input: String, width: Integer = 20): String = | ||
return input | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This method is now completely redundant and should be removed entirely? |
||
if (input.length() <= width) { | ||
input | ||
} else { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not the callee-preserved registers, which are the ones that must be preserved by a subroutine call. Those are R19-R29 and R31.
What this should actually be called is the caller-preserved registers (which means it's the caller's responsibility to preserve them if it wants to) and it should consist of R0 to R18, and R30.