Non-deterministic GoTos for indirect call resolution #132

l-kent · 2023-11-01T04:27:03Z

Implements Non-Deterministic Branching in IR #64
Improves how indirect calls are resolved when there are multiple possible call targets using the non-deterministic GoTos, now resulting in more accurate control flow in Boogie
Adds a method for pretty-printing analysis results in a usable manner
Includes some of the changes from yousif-memory-region-analysis branch, up until before the current incomplete work on adding a lifted lattice
Cleans up analysis code slightly

…tiple targets

…han the .dot one

ailrst · 2023-11-01T06:56:14Z

src/main/scala/analysis/Analysis.scala

  override def equals(obj: Any): Boolean = obj match {
-    case StackRegion(ri, st, _) => st == start
+    case s: StackRegion => s.start == start


Should we also consider the region identifier; I'm not clear what the identifier is supposed to represent. This is fine E.g. in an intra-procedural analysis if we make start the offset from SP and only compare regions in the same stack frame. In WYSINWYX they calculate all the stack regions as offsets from the SP in main() so they're comparable through their offset.

This was just a stylistic change but the equality does need to be cleaned up, you're right. It's not good that the hashCode and equals methods don't match for these MemoryRegion classes. I'm fixing that up now.

The RegionIdentifier is just the name of the region at the moment. For a StackRegion or HeapRegion it just ends up being 'stack_1' or 'malloc_5' or something like that - just a unique identifier without any real meaning, but for a DataRegion it's the name of whatever function or variable the region points to that is later used to resolve indirect calls, which is not really the most robust way to do that.

Right now the analysis only tracks the stack offsets directly in relation to offsets from R31 - just tracking when there's a load(R31 + x) or store(R31 + x) (although it doesn't check the operator either), and each procedure has its own separate set of stack regions. There are obvious issues with this approach - it's only the offset relative to the current stack pointer, not relative to whatever it was at the start of the procedure or start of main, and it isn't checking the operator either.

Fixing these latter two issues will require a broader overhaul though.

Yeah, this is a bigger change needed later I probably should have made a clearer issue about this, but its part of #72 and #73. We discussed with Yousif and Nick a while ago and decided we should track SP in abstract through procedure calls in a way that equality is defined for stack pointers originating in different procedures. Its worth noting though that Nick found that boogie verifies a lot faster if SP is given a concrete initial value in main().

Yeah, of course we will need a way to handle & identify the same stack location being accessed from different procedures (either via a pointer as a parameter, or parameters being passed on the stack - are there any other ways it could happen?)

Make an issue for giving a concrete SP value, that's good to know. I don't know how practical that will be though, wouldn't it be necessary to set a concrete SP value for every procedure (which is obviously not really similarly practical, necessarily) to truly get the benefit? It's worth making a note of and investigating further once we have the MRA/VSA working properly, especially for programs with multiple procedures.

are there any other ways it could happen?

As far as VSA is concerned a stack pointer could be stored in any region, e.g. global or heap memory.

wouldn't it be necessary to set a concrete SP value for every procedure (which is obviously not really similarly practical, necessarily) to truly get the benefit

Yeah, we can do this if a procedure doesn't have a pointer to its caller's stack and doesn't place constraints on its value, or we can keep track for the offset from main's stack and pass it through.

Yeah I guess those cases are theoretically possible but would probably be the result of rather nasty code.

Keeping track of the offset from main's initial SP does not seem like a practical approach for programs that call the same procedure many times from different locations, or even just in cases where we have to overapproximate the number of indirect call targets.

I think the key thing will be to identify which parts of a procedure's stack can be accessed from a different procedure.

# Conflicts: # src/main/scala/ir/Statement.scala # src/main/scala/translating/BAPToIR.scala

ailrst

I put the dot output back, the other comments aren't super urgent but would be nice to fix, but this is fine to merge.

ailrst · 2023-11-02T00:12:36Z

src/main/scala/util/RunUtils.scala

+              val procedure = c.parent.data
+              indirectCall.condition match {
+                // it doesn't seem like calls can actually have conditions in the ARM64 instruction set
+                case Some(_) => throw Exception("indirect call has a condition")


If we do a path-sensitive analysis to resolve indirect calls we might want to use this.
I would also prefer to avoid throwing exceptions, or at least not Exception, and not handling it somewhere. Even if its only going to be thrown in case of a bad future refactoring, it makes it a pain to test if the tool just falls over.

This is happening at the point at which we resolve indirect calls, so we wouldn't be using IndirectCall.condition anyway if we were using some sort of path-sensitive analysis.

This was an exception because it does indicate a fundamental issue, but I'll just remove calls having conditions from the IR completely now that I've confirmed it's unnecessary.

This is happening at the point at which we resolve indirect calls

Right of course yeah, then we can remove the indirect call condition.

src/main/scala/util/RunUtils.scala

ailrst · 2023-11-02T00:52:22Z

src/main/scala/util/RunUtils.scala


    Logger.info("[!] Resolving CFG")
-    val (newIR, modified) = resolveCFG(cfg, vsaResult.asInstanceOf[Map[CfgNode, Map[Variable, Set[Value]]]], IRProgram)
+    val (newIR, modified): (Program, Boolean) = resolveCFG(cfg, vsaResult, IRProgram)
+    /*
    if (modified) {
      Logger.info(s"[!] Analysing again (iter $iterations)")
      return analyse(newIR, externalFunctions, globals, globalOffsets)
    }


Is there a reason to remove this? We iterate to also analyse the code now made reachable by resolving the indirect calls, in general it should only run twice.

I removed it when testing because it was overwriting the logged analysis outputs, but I'll add a better solution there.

…4 instruction set

# Conflicts: # .gitignore # src/main/scala/util/RunUtils.scala # src/test/scala/MemoryRegionAnalysisMiscTest.scala

…rect-calls-nondet # Conflicts: # src/main/scala/util/RunUtils.scala

…files are created for each iteration of the analyses

l-kent · 2023-11-02T23:25:22Z

I've fixed the MemoryRegion equals/hashCode issue, added flags for sending the analysis results to files (and a flag for the dot output), made it so separate results files are created for each analysis iteration, and removed conditions from DirectCall and IndirectCall since BAP really shouldn't produce them given the ARM64 instruction set.

ailrst · 2023-11-03T06:56:41Z

src/main/scala/util/RunUtils.scala

+
+    config.analysisDotPath match {
+      case Some(s) => writeToFile(cfg.toDot(Output.labeler(constPropResult, constPropSolver.stateAfterNode), Output.dotIder), s"${s}_constprop$iteration.dot")
+      case None =>


Nitpick but it might be cleaner to do a .map()

ailrst

All the cleanup stuff is really good to have, just some nit picks.

ailrst · 2023-11-03T07:09:29Z

src/main/scala/translating/BAPToIR.scala

@@ -62,23 +62,19 @@ class BAPToIR(var program: BAPProgram, mainAddress: Int) {
  private def translate(s: BAPStatement) = s match {
    case b: BAPMemAssign   => MemoryAssign(b.lhs.toIR, b.rhs.toIR, Some(b.line))
    case b: BAPLocalAssign => LocalAssign(b.lhs.toIR, b.rhs.toIR, Some(b.line))
-    case _                 => throw new Exception("unsupported statement: " + s)
  }


I get a Pattern Match Exhaustivity Warning here now due to not handling BapAssign.

ailrst · 2023-11-03T07:19:54Z

src/main/scala/analysis/Analysis.scala

  override def equals(obj: Any): Boolean = obj match {
-    case r: HeapRegion => regionIdentifier.equals(r.regionIdentifier)
-    case _             => false
+    case h: HeapRegion => h.start == start && h.regionIdentifier == regionIdentifier


I think the analysis makes start the size of the allocation so we don't want to use it in the region equality (different sizes should imply different identifiers)? But we might as well make these case classes?

They were case classes before but I changed them to be classes because it was pointless to make them case classes when the hashCode and equals methods were being overwritten, and they had the mutable extent parameter. I'm just going to remove HeapRegion.start for now since it's misleading and incorrect (nothing meaningful is even done with the HeapRegions at present anyway) but fixing #72 will require a broader overhaul anyway.

yousifpatti and others added 17 commits September 26, 2023 16:55

Added reassign and stack ptr examples

9cceafb

Adding parent function to CFG statements

21948c4

Added Stack and Data renaming

fedb667

Identifying constant regions

3208cdd

Merge branch 'analysis-devel' into yousif-memory-region-analysis

b1f3d15

Fixes to merge comments

1aad929

Merge branch 'main' into yousif-memory-region-analysis

f48395a

Fixed merge issues

c9f37e3

add non-deterministic goto to IR, use it when indirect calls have mul…

9610dea

…tiple targets

Merge branch 'main' into indirect-calls-nondet

3a6a93f

printer for analysis results that is ok and much more readable than t…

42b2973

…han the .dot one

clean up MRA/VSA a bit

185ed71

minor VSA improvement

d9c5e39

syntax cleanup

d21425b

more type cleanup

595c284

improve analysis printer

d1de241

add *.txt and *.csv to gitignore

936596a

l-kent requested a review from ailrst November 1, 2023 04:34

ailrst reviewed Nov 1, 2023

View reviewed changes

l-kent and others added 2 commits November 2, 2023 10:05

Merge branch 'main' into indirect-calls-nondet

4e0ced0

# Conflicts: # src/main/scala/ir/Statement.scala # src/main/scala/translating/BAPToIR.scala

add back dot output and wrap bubble labels

0f15ae5

ailrst approved these changes Nov 2, 2023

View reviewed changes

l-kent added 3 commits November 2, 2023 12:03

fix memory region equals/hashcode

e1a5022

remove conditions from calls since they should not appear in the ARM6…

cc08880

…4 instruction set

Merge branch 'main' into indirect-calls-nondet

2a5fd77

# Conflicts: # .gitignore # src/main/scala/util/RunUtils.scala # src/test/scala/MemoryRegionAnalysisMiscTest.scala

l-kent mentioned this pull request Nov 2, 2023

Make IL control-flow representation match boogie more closely #134

Closed

l-kent added 2 commits November 2, 2023 15:46

Merge remote-tracking branch 'origin/indirect-calls-nondet' into indi…

9be3c66

…rect-calls-nondet # Conflicts: # src/main/scala/util/RunUtils.scala

add flags for printing analysis results, make it so separate results …

99b9bd8

…files are created for each iteration of the analyses

l-kent requested a review from ailrst November 3, 2023 00:01

ailrst reviewed Nov 3, 2023

View reviewed changes

ailrst approved these changes Nov 3, 2023

View reviewed changes

fix noted issues

fb919aa

l-kent merged commit e63dcc1 into main Nov 6, 2023
1 check passed

l-kent mentioned this pull request Nov 6, 2023

Non-Deterministic Branching in IR #64

Closed

l-kent deleted the indirect-calls-nondet branch November 7, 2023 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-deterministic GoTos for indirect call resolution #132

Non-deterministic GoTos for indirect call resolution #132

l-kent commented Nov 1, 2023

ailrst Nov 1, 2023

l-kent Nov 2, 2023

ailrst Nov 2, 2023

l-kent Nov 2, 2023

ailrst Nov 2, 2023

l-kent Nov 2, 2023

ailrst left a comment

ailrst Nov 2, 2023

l-kent Nov 2, 2023

ailrst Nov 2, 2023 •

edited

Loading

ailrst Nov 2, 2023

l-kent Nov 2, 2023

l-kent commented Nov 2, 2023

ailrst Nov 3, 2023

ailrst left a comment

ailrst Nov 3, 2023

ailrst Nov 3, 2023

l-kent Nov 6, 2023

Non-deterministic GoTos for indirect call resolution #132

Non-deterministic GoTos for indirect call resolution #132

Conversation

l-kent commented Nov 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ailrst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ailrst Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-kent commented Nov 2, 2023

Choose a reason for hiding this comment

ailrst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ailrst Nov 2, 2023 •

edited

Loading