Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call as statement #230

Merged
merged 8 commits into from
Sep 9, 2024
Merged

Call as statement #230

merged 8 commits into from
Sep 9, 2024

Conversation

ailrst
Copy link
Contributor

@ailrst ailrst commented Aug 8, 2024

This moves direct and indirect calls from block terminators into the statement list. It also adds Unreachable (which has the semantics of assume false) and Return Jumps, so a block terminator is now just Unreachable | Return | GoTo.

We currently still have the invariant of

  • 1 call per block
  • calls can only appear as the last statement in the block

The IndirectCall(R30) -> return and return -> goto returnblock transforms are applied to this IR

Dataflow analyses seem mostly unaffected since the framework handles control flow, possibly need to look at the IDE solver more closely. IndirectCall resolution works, but the Live vars analysis using the IDE solver is broken for a couple of the test cases for unclear reasons. LiveVars tests works at least now, since nothing uses the forwards IDE analysis it might have superficial bugs but is probably fine.


Motivation

  • the existing block terminator is very weird and makes modifying block jumps very complicated to ensure it remains well-formed, and to handle the case of when it is a call and potentially has a fallthrough.
  • Calls are now structurally ensured to have a successor command (rather than it being possibly None). This successor is Unreachable in the case of a non-returning call.
  • This change removes the obligation for call statements to know about their fall-through edge (hence meaning they describe exactly one control-flow edge in each direction). Control always passes to the next statement, and the next statement describes where control flows after the call.
  • It separates the inter-procedural control-flow from the block-level intraprocedural control flow; A call edges exist across call statements and goto edges exist between blocks (See first point), so you don't have to handle both in a context where you only care about one.
  • Facilitates a higher-level IR in the future and later transformation stages in the tool:
    • Allow later transforms to place multiple calls in a single block, making it more readable and expressive for higher level call structures
    • Calls which take local arguments and assign their return values to locals

@ailrst ailrst force-pushed the call-statement branch 2 times, most recently from b875191 to 37ee553 Compare August 9, 2024 07:37
@ailrst ailrst force-pushed the call-statement branch 2 times, most recently from 1001446 to fd56c9a Compare August 19, 2024 01:57
@ailrst ailrst marked this pull request as ready for review August 19, 2024 02:15
@l-kent
Copy link
Contributor

l-kent commented Aug 21, 2024

The procedure summaries (which are now in main) and the DSA (not merged in yet) both use the forward IDE solver.

@l-kent
Copy link
Contributor

l-kent commented Aug 21, 2024

Exceptions get thrown when running each of the following tests with --analyse:

  • correct/indirect_call/gcc_O2:BAP
  • correct/malloc_with_local3/gcc_O2:BAP
  • correct/syscall/gcc_O2:BAP

@ailrst
Copy link
Contributor Author

ailrst commented Aug 26, 2024

correct/syscall/gcc_O2:BAP

main is a stub of fork() which makes the externalremover remove the entry/return blocks after they were added, and as a result IDEsolver falls over getting the entry and exit of mainProcedure. Its also strange that the bap loader for this proc gets two blocks with the label lfork, which seems to break the assumptions of BAPtoIR.

ie with

--- a/src/main/scala/translating/BAPToIR.scala
+++ b/src/main/scala/translating/BAPToIR.scala
@@ -44,4 +44,5 @@ class BAPToIR(var program: BAPProgram, mainAddress: Int) {
     for (s <- program.subroutines) {
       val procedure = nameToProcedure(s.name)
+      println(s"blocks ${procedure.name}:\n${s.blocks.map(b => s"block ${b.toString}").mkString(" ")}")
       for (b <- s.blocks) {
         val block = labelToBlock(b.label)
blocks fork
block lfork Some(1600)
block lfork Some(1552)
R16 := 65536bv64;
R17 := mem[BAPBinOp(PLUS,R16,4024bv64)];
R16 := BAPBinOp(PLUS,R16,4024bv64);

from adt

  Sub(
   Tid(1_499, "@fork"),
   Attrs([
    Attr("c.proto", "signed (*)(void)"),
    Attr("returns-twice", "()"),
    Attr("address", "0x640"),
    Attr("stub", "()")]),
   "fork",
   Args([
    Arg(
     Tid(1_543, "%00000607"),
     Attrs([
      Attr("c.layout", "[signed : 32]"),
      Attr("c.data", "Top:u32"),
      Attr("c.type", "signed")]),
     Var("fork_result", Imm(32)),
     LOW(32, Var("R0", Imm(64))),
     Out())]),
   Blks([
    Blk(
     Tid(424, "@fork"),
     Attrs([
      Attr("address", "0x640")]),
     Phis([]),
     Defs([]),
     Jmps([
      Goto(
       Tid(427, "%000001ab"),
       Attrs([
        Attr("address", "0x640"),
        Attr("insn", "b #-0x30")]),
       Int(1, 1),
       Direct(Tid(425, "@fork")))])),

I'm not really familiar with the frontend and not sure what is happening here

@ailrst
Copy link
Contributor Author

ailrst commented Aug 26, 2024

The procedure summaries (which are now in main) and the DSA (not merged in yet) both use the forward IDE solver.

I still have to merge in main and check the forward solver works, good to know

@l-kent
Copy link
Contributor

l-kent commented Aug 26, 2024

For correct/syscall/gcc_O2:BAP, that seems to be because BAP produces multiple blocks with the same name in that test case (it's a case that BAP doesn't really lift particularly well for various reasons), and we've just been lucky and never had that cause problems before somehow. I'll make an issue for it and tweak the BAP loader to account for that.

@l-kent
Copy link
Contributor

l-kent commented Aug 26, 2024

Your fix makes it so that the stub procedures which had bodies removed now have bodies created for them, which I don't think we really want?

An example of what is produced (test/correct/function1/clang):

implementation printf()
{
  printf_basil_entry:
    assume {:captureState "printf_basil_entry"} true;
    goto printf_basil_return;
  printf_basil_return:
    assume {:captureState "printf_basil_return"} true;
    return;
}

@ailrst
Copy link
Contributor Author

ailrst commented Aug 26, 2024

Yeah, I was a bit frustrated by this on Friday. I agree this is not ideal as it conveys a procedure that does nothing rather than a procedure with missing/external behaviour. Calling IDESolver on an external function could be reasonably considered to be violating its contract, so I'm ok with an exception in this case, maybe we just pick a different procedure as its starting point or disable it in this case?

  • for this reason the behaviour of addReturnNodes(p, true) is wrong and I think the IDESolver has been properly fixed so isCall ignores stubs so that it doesn't need entry and exits for all procedures anymore, which was my initial mistake.

@l-kent
Copy link
Contributor

l-kent commented Aug 26, 2024

I don't think we need an exception as long as we make it so that IDESolver can just ignore procedure stubs properly?

@ailrst
Copy link
Contributor Author

ailrst commented Aug 29, 2024

That's true in general, but when the main procedure is a stub the IDESolver constructor wants to initialise it as the entrypoint, I guess we could pick another procedure arbitrarily as the entry point in that case.

@ailrst
Copy link
Contributor Author

ailrst commented Aug 30, 2024

I've fixed the external external functions issue so externalremover works propertly, and made it so the IDESolver analyses are disabled (printing a warning) when the main procedure is external.

@l-kent
Copy link
Contributor

l-kent commented Aug 30, 2024

That's true in general, but when the main procedure is a stub the IDESolver constructor wants to initialise it as the entrypoint, I guess we could pick another procedure arbitrarily as the entry point in that case.

If the main procedure is a stub then there isn't really anything for the IDESolver to do - picking another procedure arbitrarily isn't a good solution. A better solution would be to just to not do any analysis at all.

@l-kent
Copy link
Contributor

l-kent commented Aug 30, 2024

Oh you came to the same conclusion already, I'm glad

@l-kent
Copy link
Contributor

l-kent commented Aug 30, 2024

There's an error for correct/syscall/clang_O2:BAP with the analysis on - it doesn't generate a return block for main, so then when the BackwardIDESolver is initialised for the InterLiveVarsAnalysis, IRWalk.lastInProc returns None and NoSuchElementException gets thrown.

@ailrst
Copy link
Contributor Author

ailrst commented Aug 30, 2024

That's because main is nonreturning in that case, should probably also disable the IDE solver in this case.

@l-kent
Copy link
Contributor

l-kent commented Aug 30, 2024

It looked like some other cases where main is non-returning don't have the same issue though, just that one. It is a weird case that BAP doesn't lift very well.

@ailrst ailrst merged commit c9ad83d into main Sep 9, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants