Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic GoTos for indirect call resolution #132

Merged
merged 25 commits into from
Nov 6, 2023

Conversation

l-kent
Copy link
Contributor

@l-kent l-kent commented Nov 1, 2023

  • Implements Non-Deterministic Branching in IR #64
  • Improves how indirect calls are resolved when there are multiple possible call targets using the non-deterministic GoTos, now resulting in more accurate control flow in Boogie
  • Adds a method for pretty-printing analysis results in a usable manner
  • Includes some of the changes from yousif-memory-region-analysis branch, up until before the current incomplete work on adding a lifted lattice
  • Cleans up analysis code slightly

@l-kent l-kent requested a review from ailrst November 1, 2023 04:34
override def equals(obj: Any): Boolean = obj match {
case StackRegion(ri, st, _) => st == start
case s: StackRegion => s.start == start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider the region identifier; I'm not clear what the identifier is supposed to represent. This is fine E.g. in an intra-procedural analysis if we make start the offset from SP and only compare regions in the same stack frame. In WYSINWYX they calculate all the stack regions as offsets from the SP in main() so they're comparable through their offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a stylistic change but the equality does need to be cleaned up, you're right. It's not good that the hashCode and equals methods don't match for these MemoryRegion classes. I'm fixing that up now.

The RegionIdentifier is just the name of the region at the moment. For a StackRegion or HeapRegion it just ends up being 'stack_1' or 'malloc_5' or something like that - just a unique identifier without any real meaning, but for a DataRegion it's the name of whatever function or variable the region points to that is later used to resolve indirect calls, which is not really the most robust way to do that.

Right now the analysis only tracks the stack offsets directly in relation to offsets from R31 - just tracking when there's a load(R31 + x) or store(R31 + x) (although it doesn't check the operator either), and each procedure has its own separate set of stack regions. There are obvious issues with this approach - it's only the offset relative to the current stack pointer, not relative to whatever it was at the start of the procedure or start of main, and it isn't checking the operator either.

Fixing these latter two issues will require a broader overhaul though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a bigger change needed later I probably should have made a clearer issue about this, but its part of #72 and #73. We discussed with Yousif and Nick a while ago and decided we should track SP in abstract through procedure calls in a way that equality is defined for stack pointers originating in different procedures. Its worth noting though that Nick found that boogie verifies a lot faster if SP is given a concrete initial value in main().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, of course we will need a way to handle & identify the same stack location being accessed from different procedures (either via a pointer as a parameter, or parameters being passed on the stack - are there any other ways it could happen?)

Make an issue for giving a concrete SP value, that's good to know. I don't know how practical that will be though, wouldn't it be necessary to set a concrete SP value for every procedure (which is obviously not really similarly practical, necessarily) to truly get the benefit? It's worth making a note of and investigating further once we have the MRA/VSA working properly, especially for programs with multiple procedures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any other ways it could happen?

As far as VSA is concerned a stack pointer could be stored in any region, e.g. global or heap memory.

wouldn't it be necessary to set a concrete SP value for every procedure (which is obviously not really similarly practical, necessarily) to truly get the benefit

Yeah, we can do this if a procedure doesn't have a pointer to its caller's stack and doesn't place constraints on its value, or we can keep track for the offset from main's stack and pass it through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess those cases are theoretically possible but would probably be the result of rather nasty code.

Keeping track of the offset from main's initial SP does not seem like a practical approach for programs that call the same procedure many times from different locations, or even just in cases where we have to overapproximate the number of indirect call targets.

I think the key thing will be to identify which parts of a procedure's stack can be accessed from a different procedure.

l-kent and others added 2 commits November 2, 2023 10:05
# Conflicts:
#	src/main/scala/ir/Statement.scala
#	src/main/scala/translating/BAPToIR.scala
Copy link
Contributor

@ailrst ailrst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the dot output back, the other comments aren't super urgent but would be nice to fix, but this is fine to merge.

val procedure = c.parent.data
indirectCall.condition match {
// it doesn't seem like calls can actually have conditions in the ARM64 instruction set
case Some(_) => throw Exception("indirect call has a condition")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do a path-sensitive analysis to resolve indirect calls we might want to use this.
I would also prefer to avoid throwing exceptions, or at least not Exception, and not handling it somewhere. Even if its only going to be thrown in case of a bad future refactoring, it makes it a pain to test if the tool just falls over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is happening at the point at which we resolve indirect calls, so we wouldn't be using IndirectCall.condition anyway if we were using some sort of path-sensitive analysis.

This was an exception because it does indicate a fundamental issue, but I'll just remove calls having conditions from the IR completely now that I've confirmed it's unnecessary.

Copy link
Contributor

@ailrst ailrst Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is happening at the point at which we resolve indirect calls

Right of course yeah, then we can remove the indirect call condition.

src/main/scala/util/RunUtils.scala Outdated Show resolved Hide resolved

Logger.info("[!] Resolving CFG")
val (newIR, modified) = resolveCFG(cfg, vsaResult.asInstanceOf[Map[CfgNode, Map[Variable, Set[Value]]]], IRProgram)
val (newIR, modified): (Program, Boolean) = resolveCFG(cfg, vsaResult, IRProgram)
/*
if (modified) {
Logger.info(s"[!] Analysing again (iter $iterations)")
return analyse(newIR, externalFunctions, globals, globalOffsets)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to remove this? We iterate to also analyse the code now made reachable by resolving the indirect calls, in general it should only run twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it when testing because it was overwriting the logged analysis outputs, but I'll add a better solution there.

l-kent added 3 commits November 2, 2023 12:03
# Conflicts:
#	.gitignore
#	src/main/scala/util/RunUtils.scala
#	src/test/scala/MemoryRegionAnalysisMiscTest.scala
l-kent added 2 commits November 2, 2023 15:46
…rect-calls-nondet

# Conflicts:
#	src/main/scala/util/RunUtils.scala
…files are created for each iteration of the analyses
@l-kent
Copy link
Contributor Author

l-kent commented Nov 2, 2023

I've fixed the MemoryRegion equals/hashCode issue, added flags for sending the analysis results to files (and a flag for the dot output), made it so separate results files are created for each analysis iteration, and removed conditions from DirectCall and IndirectCall since BAP really shouldn't produce them given the ARM64 instruction set.

@l-kent l-kent requested a review from ailrst November 3, 2023 00:01

config.analysisDotPath match {
case Some(s) => writeToFile(cfg.toDot(Output.labeler(constPropResult, constPropSolver.stateAfterNode), Output.dotIder), s"${s}_constprop$iteration.dot")
case None =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick but it might be cleaner to do a .map()

Copy link
Contributor

@ailrst ailrst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the cleanup stuff is really good to have, just some nit picks.

@@ -62,23 +62,19 @@ class BAPToIR(var program: BAPProgram, mainAddress: Int) {
private def translate(s: BAPStatement) = s match {
case b: BAPMemAssign => MemoryAssign(b.lhs.toIR, b.rhs.toIR, Some(b.line))
case b: BAPLocalAssign => LocalAssign(b.lhs.toIR, b.rhs.toIR, Some(b.line))
case _ => throw new Exception("unsupported statement: " + s)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a Pattern Match Exhaustivity Warning here now due to not handling BapAssign.

override def equals(obj: Any): Boolean = obj match {
case r: HeapRegion => regionIdentifier.equals(r.regionIdentifier)
case _ => false
case h: HeapRegion => h.start == start && h.regionIdentifier == regionIdentifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the analysis makes start the size of the allocation so we don't want to use it in the region equality (different sizes should imply different identifiers)? But we might as well make these case classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were case classes before but I changed them to be classes because it was pointless to make them case classes when the hashCode and equals methods were being overwritten, and they had the mutable extent parameter. I'm just going to remove HeapRegion.start for now since it's misleading and incorrect (nothing meaningful is even done with the HeapRegions at present anyway) but fixing #72 will require a broader overhaul anyway.

@l-kent l-kent merged commit e63dcc1 into main Nov 6, 2023
1 check passed
@l-kent l-kent deleted the indirect-calls-nondet branch November 7, 2023 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants