-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic GoTos for indirect call resolution #132
Conversation
override def equals(obj: Any): Boolean = obj match { | ||
case StackRegion(ri, st, _) => st == start | ||
case s: StackRegion => s.start == start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also consider the region identifier; I'm not clear what the identifier is supposed to represent. This is fine E.g. in an intra-procedural analysis if we make start the offset from SP and only compare regions in the same stack frame. In WYSINWYX they calculate all the stack regions as offsets from the SP in main() so they're comparable through their offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a stylistic change but the equality does need to be cleaned up, you're right. It's not good that the hashCode and equals methods don't match for these MemoryRegion classes. I'm fixing that up now.
The RegionIdentifier is just the name of the region at the moment. For a StackRegion or HeapRegion it just ends up being 'stack_1' or 'malloc_5' or something like that - just a unique identifier without any real meaning, but for a DataRegion it's the name of whatever function or variable the region points to that is later used to resolve indirect calls, which is not really the most robust way to do that.
Right now the analysis only tracks the stack offsets directly in relation to offsets from R31 - just tracking when there's a load(R31 + x) or store(R31 + x) (although it doesn't check the operator either), and each procedure has its own separate set of stack regions. There are obvious issues with this approach - it's only the offset relative to the current stack pointer, not relative to whatever it was at the start of the procedure or start of main, and it isn't checking the operator either.
Fixing these latter two issues will require a broader overhaul though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a bigger change needed later I probably should have made a clearer issue about this, but its part of #72 and #73. We discussed with Yousif and Nick a while ago and decided we should track SP in abstract through procedure calls in a way that equality is defined for stack pointers originating in different procedures. Its worth noting though that Nick found that boogie verifies a lot faster if SP is given a concrete initial value in main().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, of course we will need a way to handle & identify the same stack location being accessed from different procedures (either via a pointer as a parameter, or parameters being passed on the stack - are there any other ways it could happen?)
Make an issue for giving a concrete SP value, that's good to know. I don't know how practical that will be though, wouldn't it be necessary to set a concrete SP value for every procedure (which is obviously not really similarly practical, necessarily) to truly get the benefit? It's worth making a note of and investigating further once we have the MRA/VSA working properly, especially for programs with multiple procedures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any other ways it could happen?
As far as VSA is concerned a stack pointer could be stored in any region, e.g. global or heap memory.
wouldn't it be necessary to set a concrete SP value for every procedure (which is obviously not really similarly practical, necessarily) to truly get the benefit
Yeah, we can do this if a procedure doesn't have a pointer to its caller's stack and doesn't place constraints on its value, or we can keep track for the offset from main's stack and pass it through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess those cases are theoretically possible but would probably be the result of rather nasty code.
Keeping track of the offset from main's initial SP does not seem like a practical approach for programs that call the same procedure many times from different locations, or even just in cases where we have to overapproximate the number of indirect call targets.
I think the key thing will be to identify which parts of a procedure's stack can be accessed from a different procedure.
# Conflicts: # src/main/scala/ir/Statement.scala # src/main/scala/translating/BAPToIR.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the dot output back, the other comments aren't super urgent but would be nice to fix, but this is fine to merge.
src/main/scala/util/RunUtils.scala
Outdated
val procedure = c.parent.data | ||
indirectCall.condition match { | ||
// it doesn't seem like calls can actually have conditions in the ARM64 instruction set | ||
case Some(_) => throw Exception("indirect call has a condition") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do a path-sensitive analysis to resolve indirect calls we might want to use this.
I would also prefer to avoid throwing exceptions, or at least not Exception
, and not handling it somewhere. Even if its only going to be thrown in case of a bad future refactoring, it makes it a pain to test if the tool just falls over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is happening at the point at which we resolve indirect calls, so we wouldn't be using IndirectCall.condition anyway if we were using some sort of path-sensitive analysis.
This was an exception because it does indicate a fundamental issue, but I'll just remove calls having conditions from the IR completely now that I've confirmed it's unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is happening at the point at which we resolve indirect calls
Right of course yeah, then we can remove the indirect call condition.
|
||
Logger.info("[!] Resolving CFG") | ||
val (newIR, modified) = resolveCFG(cfg, vsaResult.asInstanceOf[Map[CfgNode, Map[Variable, Set[Value]]]], IRProgram) | ||
val (newIR, modified): (Program, Boolean) = resolveCFG(cfg, vsaResult, IRProgram) | ||
/* | ||
if (modified) { | ||
Logger.info(s"[!] Analysing again (iter $iterations)") | ||
return analyse(newIR, externalFunctions, globals, globalOffsets) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to remove this? We iterate to also analyse the code now made reachable by resolving the indirect calls, in general it should only run twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it when testing because it was overwriting the logged analysis outputs, but I'll add a better solution there.
…4 instruction set
# Conflicts: # .gitignore # src/main/scala/util/RunUtils.scala # src/test/scala/MemoryRegionAnalysisMiscTest.scala
…rect-calls-nondet # Conflicts: # src/main/scala/util/RunUtils.scala
…files are created for each iteration of the analyses
I've fixed the MemoryRegion equals/hashCode issue, added flags for sending the analysis results to files (and a flag for the dot output), made it so separate results files are created for each analysis iteration, and removed conditions from DirectCall and IndirectCall since BAP really shouldn't produce them given the ARM64 instruction set. |
src/main/scala/util/RunUtils.scala
Outdated
|
||
config.analysisDotPath match { | ||
case Some(s) => writeToFile(cfg.toDot(Output.labeler(constPropResult, constPropSolver.stateAfterNode), Output.dotIder), s"${s}_constprop$iteration.dot") | ||
case None => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick but it might be cleaner to do a .map()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the cleanup stuff is really good to have, just some nit picks.
@@ -62,23 +62,19 @@ class BAPToIR(var program: BAPProgram, mainAddress: Int) { | |||
private def translate(s: BAPStatement) = s match { | |||
case b: BAPMemAssign => MemoryAssign(b.lhs.toIR, b.rhs.toIR, Some(b.line)) | |||
case b: BAPLocalAssign => LocalAssign(b.lhs.toIR, b.rhs.toIR, Some(b.line)) | |||
case _ => throw new Exception("unsupported statement: " + s) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a Pattern Match Exhaustivity Warning here now due to not handling BapAssign
.
override def equals(obj: Any): Boolean = obj match { | ||
case r: HeapRegion => regionIdentifier.equals(r.regionIdentifier) | ||
case _ => false | ||
case h: HeapRegion => h.start == start && h.regionIdentifier == regionIdentifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the analysis makes start the size of the allocation so we don't want to use it in the region equality (different sizes should imply different identifiers)? But we might as well make these case classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were case classes before but I changed them to be classes because it was pointless to make them case classes when the hashCode and equals methods were being overwritten, and they had the mutable extent parameter. I'm just going to remove HeapRegion.start for now since it's misleading and incorrect (nothing meaningful is even done with the HeapRegions at present anyway) but fixing #72 will require a broader overhaul anyway.
yousif-memory-region-analysis
branch, up until before the current incomplete work on adding a lifted lattice