-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gtirb to ir #161
Gtirb to ir #161
Conversation
aarch64-linux-gnu-gcc (Debian 13.2.0-2) 13.2.0 |
I have 11.3.0, which explains the difference. |
This was my idea initally as well, and I wish we could, but then it messes up the Decoding/Encoding from Base64. The main reason why these UUIDs are so annoying is because the jumps are filtered using GTIRBs Edge class, which takes two ByteStrings (one source and one target). Thus, when new blocks are created, edges need to be added, and these edges require the decoded base64 string. |
That seems like a problem that is possible to solve though, I'll work on that at some point. |
name = symbols.find(functionNames(uuid) == _.uuid).get.name | ||
|
||
val entryBlocks: mutable.Set[ByteString] = functionEntries(uuid) | ||
|
||
val result = entryBlocks | ||
.flatMap(e => edgeMap.getOrElse(Base64.getUrlEncoder.encodeToString(e.toByteArray), None).iterator) | ||
.flatMap(elem => mods.flatMap(_.proxies).find(_.uuid == elem.targetUuid)) | ||
.flatMap(proxy => symMap.getOrElse(proxy.uuid, None)) | ||
.map(p => p.name) | ||
|
||
if (result.nonEmpty) { | ||
return result.head //. head seems weird here but i guess it works | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this section do? How do the proxy blocks come into getting the name of a function associated with a UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For weird linker functions introduced by the compiler, gtirb usually attaches the function name to a proxyblock instead of a block. In this case, I need to get the block that the proxybloxk is connected to, to find the right name of that function. This doesn't seem to be a problem for any functions the user has written themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. Those PLT bridging functions don't actually have names as such. When they jump to an external function that does have a name, BAP just uses that name for the bridging function too. Since DDisasm doesn't do that, I'll make it so what we're doing here is more explicit.
I have a pr to add more information to the gtirb output UQ-PAC/gtirb-semantics#6 It now stores the label next to the block (if exists), and the disassembly and address next to the instructions. This makes the semantics located at |
@l-kent Thank you. |
I don't need anything else from you, thanks. |
I'm still working on cleaning it up to get it to parse all of CNTLM, and solving the issue with correspondence between the UUIDs for split blocks, but that's almost done now. |
…e created and removed separately
2dddffe
to
0e5489a
Compare
I'm ready to merge this. I've cleaned up some of the big cntlm files from this branch's history (just the ones committed in this branch). @ailrst if you want to have a quick look before I merge feel free |
private val symbols = mods.flatMap(_.symbols).map(s => s.uuid -> s).toMap | ||
private val uuidToSymbols = createSymbolMap() | ||
private val uuidToProcedure: mutable.Map[ByteString, Procedure] = mutable.Map() | ||
private val entranceUUIDtoProcedure: mutable.Map[ByteString, Procedure] = mutable.Map() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between uuidToProcedure and entranceUUIDtoProcedure? Can we comment these mutable maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GTIRB functions have a UUID that represents them which is used in various mappings, but this is separate from the UUID for the function's entrance block, which we also commonly encounter and need to know which procedure it refers to.
private val uuidToBlock: mutable.Map[ByteString, Block] = mutable.Map() | ||
private val proxies = mods.flatMap(_.proxies.map(p => p.uuid -> p)).toMap | ||
private val blockOutgoingEdges = createCFGMap() | ||
private val externalProcedures = mutable.Map[String, Procedure]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of mutable state here and it is not clear when in the translation process it becomes valid / populated, it would be good to make this code clearer at some point about where these values come from and what they are used for (i.e. they are returned by functions and passed via parameters rather than being shared class state).
"$" + procedure.name + "$__" + blockCount + "__$" + byteStringToString(label).replace("=", "").replace("-", "~").replace("/", "\'") | ||
} | ||
|
||
private def cleanUpIfPCAssign(block: Block, procedure: Procedure): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we comment what this does? It seems to both resolve the TempIfs and indirect calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it handles the block splitting required to clean up the TempIfs and any stray _PC assignments that were not already removed, which exist due to indirect calls that were not identified by DDisasm due to a bug in it (so far these are only blr instructions that DDisasm does not handle properly)
var currentStatement = currentBlock.statements.head() | ||
var breakLoop = false | ||
val queue = mutable.Queue[Block]() | ||
while (!breakLoop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use loop.breakable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't any more convenient here
newBlocks.append(afterBlock) | ||
afterBlock.replaceJump(currentBlock.jump) | ||
// TODO currently assume return target is afterBlock, probably best to check properly though once calculating address of instructions is done | ||
afterBlock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create an issue for this? We have address instructions now I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the addresses of instructions are derived and just used as part of the statement labels. The return target being the rest of the block is a safe assumption if it is a BLR instruction (so far the only relevant case), but I don't know if there's anything else DDisasm fails to handle like this.
throw Exception(s"inconsistent size parameters in Mem.set.0: ${ctx.getText}") | ||
} | ||
if (mysteryArg != 0) { | ||
Logger.debug(s"mystery 3rd arg of Mem.set.0 has value $mysteryArg: ${ctx.getText}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3rd arg is AccessType, maybe worth being less strict since we could see vector and atomic? It guess it hasn't shown up in practice but maybe worth commenting.
enumeration AccType {AccType_NORMAL, AccType_VEC, // Normal loads and stores
AccType_STREAM, AccType_VECSTREAM, // Streaming loads and stores
AccType_ATOMIC, AccType_ATOMICRW, // Atomic loads and stores
AccType_ORDERED, AccType_ORDEREDRW, // Load-Acquire and Store-Release
AccType_ORDEREDATOMIC, // Load-Acquire and Store-Release with atomic access
AccType_ORDEREDATOMICRW,
AccType_LIMITEDORDERED, // Load-LOAcquire and Store-LORelease
AccType_UNPRIV, // Load and store unprivileged
AccType_IFETCH, // Instruction fetch
AccType_PTW, // Page table walk
AccType_NONFAULT, // Non-faulting loads
AccType_CNOTFIRST, // Contiguous FF load, not first element
AccType_NV2REGISTER, // MRS/MSR instruction used at EL1 and which is converted
// to a memory access that uses the EL2 translation regime
// Other operations
AccType_DC, // Data cache maintenance
AccType_DC_UNPRIV, // Data cache maintenance instruction used at EL0
AccType_IC, // Instruction cache maintenance
AccType_DCZVA, // DC ZVA instructions
AccType_AT}; // Address translation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a reason to throw an exception if we encounter another value, since we have not encountered any others yet and they may require a different translation. Where are the semantics of Mem.set and Mem.read defined? They aren't something that directly exists in the original ASL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the implementation of the Mem[]
array access operator https://github.com/UQ-PAC/aslp/blob/partial_eval/mra_tools/arch/arch.asl#L11795, and the actual name comes from the internal implementation of the interpreter https://github.com/UQ-PAC/aslp/blob/partial_eval/libASL/tcheck.ml#L1663
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that. Looking at that I feel comfortable enough with the approach of just logging it for further investigation if we encounter a non-0 value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that should be enough
Provides a new frontend for the BASIL tool, taking dissasembly data from ddisasm and GTIRB instead of BAP.
To use, just use a .gts file as the argument to the tool, instead of the .adt, and it spits out a .bpl file like normal. It seeks to replicate the old interface as much as it can.
You can switch between the two interfaces by commenting the GTIRB LOGIC code and everything beneath it in the LoadBap function in RunUtils, and uncommenting all the code underneath BAP LOGIC. Also, be sure to change the type of loadBap from Program to BAPProgram, as the new interface directly outputs an IR program in one translation parse.
Broadly, it is about as good as the old interface, with the notable exception that GTIRB lacks function parameters and returns, so those are hardcoded as of now. Alistair has also lifted every example in the tests folder to a .gts.