Gtirb to ir #161

Megatomato · 2024-01-31T06:08:08Z

Provides a new frontend for the BASIL tool, taking dissasembly data from ddisasm and GTIRB instead of BAP.

To use, just use a .gts file as the argument to the tool, instead of the .adt, and it spits out a .bpl file like normal. It seeks to replicate the old interface as much as it can.

You can switch between the two interfaces by commenting the GTIRB LOGIC code and everything beneath it in the LoadBap function in RunUtils, and uncommenting all the code underneath BAP LOGIC. Also, be sure to change the type of loadBap from Program to BAPProgram, as the new interface directly outputs an IR program in one translation parse.

Broadly, it is about as good as the old interface, with the notable exception that GTIRB lacks function parameters and returns, so those are hardcoded as of now. Alistair has also lifted every example in the tests folder to a .gts.

ailrst · 2024-02-08T00:10:58Z

aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0

l-kent · 2024-02-08T00:19:56Z

I have 11.3.0, which explains the difference.

Megatomato · 2024-02-08T00:20:20Z

Rather than creating a completely new UUID, it would be best to just give it the same label as its originator block, just with $_n (for the nth additional block created from the original block) after it, so we can maintain the correspondence.

This was my idea initally as well, and I wish we could, but then it messes up the Decoding/Encoding from Base64. The main reason why these UUIDs are so annoying is because the jumps are filtered using GTIRBs Edge class, which takes two ByteStrings (one source and one target). Thus, when new blocks are created, edges need to be added, and these edges require the decoded base64 string.

l-kent · 2024-02-08T00:29:48Z

That seems like a problem that is possible to solve though, I'll work on that at some point.

l-kent · 2024-02-09T03:13:47Z

src/main/scala/translating/GTIRBToIR.scala

+      name = symbols.find(functionNames(uuid) == _.uuid).get.name
+
+      val entryBlocks: mutable.Set[ByteString] = functionEntries(uuid)
+
+      val result = entryBlocks
+        .flatMap(e => edgeMap.getOrElse(Base64.getUrlEncoder.encodeToString(e.toByteArray), None).iterator)
+        .flatMap(elem => mods.flatMap(_.proxies).find(_.uuid == elem.targetUuid))
+        .flatMap(proxy => symMap.getOrElse(proxy.uuid, None))
+        .map(p => p.name)
+
+      if (result.nonEmpty) {
+        return result.head //. head seems weird here but i guess it works
+      }


What does this section do? How do the proxy blocks come into getting the name of a function associated with a UUID?

For weird linker functions introduced by the compiler, gtirb usually attaches the function name to a proxyblock instead of a block. In this case, I need to get the block that the proxybloxk is connected to, to find the right name of that function. This doesn't seem to be a problem for any functions the user has written themselves.

Ah, ok. Those PLT bridging functions don't actually have names as such. When they jump to an external function that does have a name, BAP just uses that name for the bridging function too. Since DDisasm doesn't do that, I'll make it so what we're doing here is more explicit.

ailrst · 2024-02-15T01:39:31Z

I have a pr to add more information to the gtirb output UQ-PAC/gtirb-semantics#6

It now stores the label next to the block (if exists), and the disassembly and address next to the instructions.

This makes the semantics located at block["instructions"][i]["semantics"].

Megatomato · 2024-02-16T01:17:48Z

@l-kent
Is there anything else we need to add/ review on this pr?
I'm asking since it's my last day, so if you want any changes with the base gtirb functionality, today would probably be better.

Thank you.

l-kent · 2024-02-16T01:32:55Z

I don't need anything else from you, thanks.

l-kent · 2024-02-16T01:47:05Z

I'm still working on cleaning it up to get it to parse all of CNTLM, and solving the issue with correspondence between the UUIDs for split blocks, but that's almost done now.

…e created and removed separately

l-kent · 2024-02-21T02:29:57Z

I'm ready to merge this. I've cleaned up some of the big cntlm files from this branch's history (just the ones committed in this branch).

@ailrst if you want to have a quick look before I merge feel free

ailrst · 2024-02-21T04:37:50Z

src/main/scala/translating/GTIRBToIR.scala

+  private val symbols = mods.flatMap(_.symbols).map(s => s.uuid -> s).toMap
+  private val uuidToSymbols = createSymbolMap()
+  private val uuidToProcedure: mutable.Map[ByteString, Procedure] = mutable.Map()
+  private val entranceUUIDtoProcedure: mutable.Map[ByteString, Procedure] = mutable.Map()


What is the difference between uuidToProcedure and entranceUUIDtoProcedure? Can we comment these mutable maps.

GTIRB functions have a UUID that represents them which is used in various mappings, but this is separate from the UUID for the function's entrance block, which we also commonly encounter and need to know which procedure it refers to.

ailrst · 2024-02-21T04:39:04Z

src/main/scala/translating/GTIRBToIR.scala

+  private val uuidToBlock: mutable.Map[ByteString, Block] = mutable.Map()
+  private val proxies = mods.flatMap(_.proxies.map(p => p.uuid -> p)).toMap
+  private val blockOutgoingEdges = createCFGMap()
+  private val externalProcedures = mutable.Map[String, Procedure]()


There is a lot of mutable state here and it is not clear when in the translation process it becomes valid / populated, it would be good to make this code clearer at some point about where these values come from and what they are used for (i.e. they are returned by functions and passed via parameters rather than being shared class state).

ailrst · 2024-02-21T04:53:43Z

src/main/scala/translating/GTIRBToIR.scala

+    "$" + procedure.name + "$__" + blockCount + "__$" + byteStringToString(label).replace("=", "").replace("-", "~").replace("/", "\'")
+  }
+
+  private def cleanUpIfPCAssign(block: Block, procedure: Procedure): Unit = {


Can we comment what this does? It seems to both resolve the TempIfs and indirect calls?

it handles the block splitting required to clean up the TempIfs and any stray _PC assignments that were not already removed, which exist due to indirect calls that were not identified by DDisasm due to a bug in it (so far these are only blr instructions that DDisasm does not handle properly)

ailrst · 2024-02-21T04:53:46Z

src/main/scala/translating/GTIRBToIR.scala

+    var currentStatement = currentBlock.statements.head()
+    var breakLoop = false
+    val queue = mutable.Queue[Block]()
+    while (!breakLoop) {


Can use loop.breakable

isn't any more convenient here

ailrst · 2024-02-21T04:54:55Z

src/main/scala/translating/GTIRBToIR.scala

+      newBlocks.append(afterBlock)
+      afterBlock.replaceJump(currentBlock.jump)
+      // TODO currently assume return target is afterBlock, probably best to check properly though once calculating address of instructions is done
+      afterBlock


Should we create an issue for this? We have address instructions now I believe

Currently the addresses of instructions are derived and just used as part of the statement labels. The return target being the rest of the block is a safe assumption if it is a BLR instruction (so far the only relevant case), but I don't know if there's anything else DDisasm fails to handle like this.

ailrst · 2024-02-21T05:12:21Z

src/main/scala/translating/SemanticsLoader.scala

+          throw Exception(s"inconsistent size parameters in Mem.set.0: ${ctx.getText}")
+        }
+        if (mysteryArg != 0) {
+          Logger.debug(s"mystery 3rd arg of Mem.set.0 has value $mysteryArg: ${ctx.getText}")


The 3rd arg is AccessType, maybe worth being less strict since we could see vector and atomic? It guess it hasn't shown up in practice but maybe worth commenting.

enumeration AccType {AccType_NORMAL, AccType_VEC, // Normal loads and stores AccType_STREAM, AccType_VECSTREAM, // Streaming loads and stores AccType_ATOMIC, AccType_ATOMICRW, // Atomic loads and stores AccType_ORDERED, AccType_ORDEREDRW, // Load-Acquire and Store-Release AccType_ORDEREDATOMIC, // Load-Acquire and Store-Release with atomic access AccType_ORDEREDATOMICRW, AccType_LIMITEDORDERED, // Load-LOAcquire and Store-LORelease AccType_UNPRIV, // Load and store unprivileged AccType_IFETCH, // Instruction fetch AccType_PTW, // Page table walk AccType_NONFAULT, // Non-faulting loads AccType_CNOTFIRST, // Contiguous FF load, not first element AccType_NV2REGISTER, // MRS/MSR instruction used at EL1 and which is converted // to a memory access that uses the EL2 translation regime // Other operations AccType_DC, // Data cache maintenance AccType_DC_UNPRIV, // Data cache maintenance instruction used at EL0 AccType_IC, // Instruction cache maintenance AccType_DCZVA, // DC ZVA instructions AccType_AT}; // Address translation

That seems like a reason to throw an exception if we encounter another value, since we have not encountered any others yet and they may require a different translation. Where are the semantics of Mem.set and Mem.read defined? They aren't something that directly exists in the original ASL

Its the implementation of the Mem[] array access operator https://github.com/UQ-PAC/aslp/blob/partial_eval/mra_tools/arch/arch.asl#L11795, and the actual name comes from the internal implementation of the interpreter https://github.com/UQ-PAC/aslp/blob/partial_eval/libASL/tcheck.ml#L1663

Thanks for that. Looking at that I feel comfortable enough with the approach of just logging it for further investigation if we encounter a non-0 value.

Yeah I agree that should be enough

Megatomato added 30 commits July 6, 2023 14:51

Ported from gtirb directory, takes .gts file and spits out parse tree

d4a9b24

Created decoder for functionEntry in auxdata

bd3b4fa

Created visitor, about 95% finished, some bugs with memory.

57baa51

Merge remote-tracking branch 'origin' into walter-gtirb-to-ir

4eaa489

Updated gtirb to IR to properly reflect jumps

1d196b5

Fixes Memory bugs, fiddles with proxyblocks

4fec863

Renamed grammar so not confusing, attempted to fix decoder but failed :(

c334112

Merge branch 'main' into walter-gtirb-to-ir

9e0fa5f

Merge

8e3f49e

code now works with changea made in main

7631f95

Parser creates IR, bugs in bitvectors

29dafa1

Now prints an IR

26352da

new semantics because gtirb is weird

2cd1ee5

Changed .gts, added new decoder and function names

71fe88f

Centralised Decoders and Arguments so not bad

4fc32ac

Added addresses and changed jump logic

c62cfb4

fixed function names w/ proxies

2249838

Updated Jump logic(again)

91fb9b5

Merge remote-tracking branch 'origin' into walter-gtirb-to-ir

a82e3bc

Fixed jump logic for real and some mem issue

6184518

updated .adt so compiler issues no longer a thing

931dd33

Fixed block order + removed "_PC" calls

d5fde22

Fixed SignExtend, ZeroExtend and Extract

336a638

Remodeled Jumps based on Gtirb labeling

42e620b

Now adds extra blocks and correctly parses conditional statements

67110ae

Attempted to fix mainproc and fixed ashr ZeroExtent

e4da775

Created example that produces multi-else if statement

059d60d

Can now parse If-chain statements

ba3beac

now does longifs, with some small errors

f3c1eed

Merge remote-tracking branch 'origin' into walter-gtirb-to-ir

51915a4

Megatomato and others added 2 commits February 8, 2024 10:41

Fix for get_jmp_reg and made Stmt_Tcall more robust

90fa7ee

revert changes to examples

141449b

l-kent reviewed Feb 9, 2024

View reviewed changes

re-add useful removed examples

7313f1b

l-kent added 7 commits February 19, 2024 17:01

big rework so now cntlm is handled

4e1a796

fix modifies propagation

4136686

add exception messages

f84c7f1

add labels to statements from gtirb

5477e6d

edit make files to allow so json not automatically created, gts can b…

5068242

…e created and removed separately

update makefiles

102a75a

update gitignore

0e5489a

l-kent force-pushed the walter-gtirb-to-ir branch from 2dddffe to 0e5489a Compare February 21, 2024 01:23

l-kent added 4 commits February 21, 2024 11:51

update gts files

47a46d6

remove gts & ast.json files from examples

929af28

Merge branch 'main' into walter-gtirb-to-ir

ed65b60

update readme a little

bd1d199

update readme a little more

19a1047

ailrst reviewed Feb 21, 2024

View reviewed changes

improve comments

2416045

l-kent merged commit c5d396a into main Feb 22, 2024
1 check passed

l-kent deleted the walter-gtirb-to-ir branch February 23, 2024 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gtirb to ir #161

Gtirb to ir #161

Megatomato commented Jan 31, 2024 •

edited

Loading

ailrst commented Feb 8, 2024

l-kent commented Feb 8, 2024

Megatomato commented Feb 8, 2024 •

edited

Loading

l-kent commented Feb 8, 2024

l-kent Feb 9, 2024

Megatomato Feb 9, 2024

l-kent Feb 9, 2024

ailrst commented Feb 15, 2024

Megatomato commented Feb 16, 2024

l-kent commented Feb 16, 2024

l-kent commented Feb 16, 2024

l-kent commented Feb 21, 2024

ailrst Feb 21, 2024

l-kent Feb 21, 2024

ailrst Feb 21, 2024

ailrst Feb 21, 2024

l-kent Feb 21, 2024

ailrst Feb 21, 2024

l-kent Feb 21, 2024

ailrst Feb 21, 2024

l-kent Feb 21, 2024

ailrst Feb 21, 2024

l-kent Feb 21, 2024

ailrst Feb 21, 2024

l-kent Feb 21, 2024

ailrst Feb 21, 2024

Gtirb to ir #161

Gtirb to ir #161

Conversation

Megatomato commented Jan 31, 2024 • edited Loading

ailrst commented Feb 8, 2024

l-kent commented Feb 8, 2024

Megatomato commented Feb 8, 2024 • edited Loading

l-kent commented Feb 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ailrst commented Feb 15, 2024

Megatomato commented Feb 16, 2024

l-kent commented Feb 16, 2024

l-kent commented Feb 16, 2024

l-kent commented Feb 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Megatomato commented Jan 31, 2024 •

edited

Loading

Megatomato commented Feb 8, 2024 •

edited

Loading