-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the bil-to-boogie-translator wiki!
The parser has two stages, the parsing phase and the walking phase.
In the parsing phase primarily depends on the rules laid out in the BilAdt.g4 grammar file. The text is converted into lexical tokens, which are then parsed to form a parse tree.
Antlr does this by converting the grammar into java source, which is capable of taking a string input and creating the aforementioned parse tree.
The Walking/Listening phase primarily depends on the AdtStatementLoader.scala file. The Antlr walker essentially traverses the parse tree generated in the previous phase, and performs actions when entering and exiting each token.
The primary goal of this stage is to produce a "State" data structure from our parse tree, which is our representation of the program. We output information from this stage such as the list of statements, and we also keep a dictionary of expressions which we use in the statements.
[TODO]
[TODO]
- minor fixes to old-state version:
- turning off constant propagation
- adding examples.
- update readme on how to run new examples
- New Intermediate Representation (IR) Design:
- review intermediate representation (State data structure)
- update tests
- investigate ARM stack semantics
- cleanup generation of memory accesses
- generating rely-guarantee & verification conditions only when needed (memory accesses, branches etc)
- combine reg31 and stack pointer
- review constant propagation and folding
- use information from new parser
- split inference from replacing?
- start with simple boolean security levels
- fix rely conditions and implement guarantees
- alternate representation to bit-vectors
- use ints instead where possible
- or simple booleans for single bits
- improved memory model
- points-to analysis (partition memory)
- investigate LLVM data structure analysis algorithm (https://llvm.org/pubs/2007-06-10-PLDI-DSA.pdf)
- separate stack from heap (more reliably)
- 2022-06-01: decided to record design decisions.
- infer loop invariants
- Explicit bit vector lengths / types.
- Explicit control flow graph information
- More intuitive casting syntax
- It should be a complete representation of what the lifter knows - i.e. there should never be missing information (probably)
- It is extremely verbose and somewhat difficult to parse compared to the BIL
- Unsure whether it's officially supported or maintained
The grammar of the parser is inspired by the existing parser found at BinaryAnalysisPlatform/bap-python, in particular, the nodes found in adt.py, br.py and bil.py. The grammar attempts to reflect the inheritance hierarchy - the alternatives of a token are its subclasses.
For example, exp
and term
are subclasses of ADT
, and thus in the grammar we have adt : exp | term | ...
.
- Negative numbers are encoded somewhat strangely. It takes the 2's complement representation of the number, and interprets the binary string as if it were an unsigned integer.
- For example, the number -2 in 64 bits is 1111111111111111111111111111111111111111111111111111111111111110, which is interpreted in the ADT as Int(18446744073709551614, 64).