diff --git a/README.md b/README.md index 70283f2b2..9a666cc9a 100644 --- a/README.md +++ b/README.md @@ -1,51 +1,108 @@ -# Universal Symbolic Virtual Machine +# What is USVM? -**Universal Symbolic Virtual Machine**, or **USVM**, is an ultimately powerful _language-agnostic_ core for implementing -custom symbolic execution based products. +USVM is a powerful symbolic execution core designed for analysis of programs in multiple programming languages. Symbolic execution is known to be a very precise but resource demanding technique. USVM makes it faster, more stable, and versatile. -## How we got here +USVM main features include -USVM is the result of years of experience designing symbolic execution engines for various programming languages. Many -engines have a lot in common — so why don't we extract and polish it? Many of them still feature different advantages -— so why don't we unify them to get the most out of symbolic execution? +* extensible and highly optimized symbolic memory model +* optimized constraint management to reduce SMT solver workload +* improved symbolic models for containers (`Map`, `Set`, `List`, etc.) +* targeted symbolic execution +* solving type constraints with no SMT solver involved +* bit-precise reasoning +* forward and backward symbolic exploration -USVM abstracts language primitives into the generic ones, so, with a simple DSL provided, you are free to implement an -interpreter for your language. USVM integrates the particular symbolic execution enhancements into one compact form -— an -efficient and configurable -core with a unified API. +# How can I use it? -## Key features +With USVM, you can achieve completely automated +* [static code analysis](#taint-analysis-with-usvm), +* [unit test cases generation](#usvm-for-unit-test-generation), +* [targeted fuzzing and more symbolic execution based solutions](#using-usvm-to-confirm-sarif-reports). -What does it mean to be an _efficient_ core? +Right now, we have ready-to-be-used implementation for [Java](https://github.com/UnitTestBot/usvm/tree/main/usvm-jvm) and experimental implementation for [Python](https://github.com/UnitTestBot/usvm/tree/tochilinak/python/usvm-python). -The USVM core redefines language primitives, so they become symbolic and language-independent, and provides us -with a range of benefits: -* optimized constraint management to reduce SMT solver workload; -* extensible symbolic memory model; -* forward and backward symbolic exploration; -* improved symbolic models for containers (`map`, `set`, `list`, etc.); -* solving type constraints with no SMT solver involved; -* bit-precise reasoning. +# Taint analysis with USVM -## Language-specific implementations +USVM supports interprocedural condition-sensitive taint analysis of JVM bytecode. For instance, it is able to automatically find the following SQL injection: +![True positive sample](./docs/assets/images/injection.png) -We are now developing a user-friendly USVM API, so that you can easily adapt the core to analyzing programs -in the required language. +By default, USVM is able to find other problems: +* null reference exceptions, +* out-of-bounds access for collections, +* integer overflows, +* division by zero. -For now, we have already implemented symbolic interpreters for _Java_ and _Python_ (the latter is an experimental but -working example). And the interpreters for _Go_ and _JavaScript_ are under development, so stay tuned. +You can also extend its analysis rules by [writing custom checkers](#writing-custom-checkers). -These ready-to-use symbolic interpreters are for the managed languages mostly, but you have everything you need to -analyze programs in whatever language — whether it is _C++_ or an exotic (or even custom) one. -## Applicable scope +You can run USVM in your repo CI by configuring the [ByteFlow](https://github.com/UnitTestBot/byteflow) runner. -Symbolic execution is known to be a powerful but slow and demanding technique. USVM makes it faster, more stable, and -versatile. +## About condition-sensitive analysis -You can build custom symbolic execution engines with the USVM core inside to show better performance in -* test generation, -* static analysis, -* verification -* targeted fuzzing,
and more symbolic execution based solutions. +If we modify the above program a little, things change drastically: +![False positive sample](./docs/assets/images/injection_fp.png) + +All interprodecural dataflow analysers we've tried report the similar warning for this program. However, this is false alarm: untrusted data is read only in development mode (that is, when `production` field is false), but the real database query happens only in production mode. + +The reason why the existing analysers are wrong is the lack of condition-sensitive analysis: they simply do not understand that untrusted data is emitted only under conditions that prevent program from getting into `checkUserIsAdminProd` method. + +The major reason for this is that condition-sensitive analysis is complex and expensive. USVM makes condition-sensitive analysis robust and scalable. In particular, USVM does not report warning in this program. + +You can run the this example online in our [demo repository](https://github.com/unitTestBot/byteflow/security/code-scanning). + +# Writing custom checkers + +USVM allows to customize its behaviour by writing custom analysis rules. To achieve this, USVM can share its internal analysis states into the attached *interpreter observers*. So the first step to write a custom checker is to implement [`JcInterpreterObserver`](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/machine/JcInterpreterObserver.kt) interface. This observer can be attached to symbolic machine [in its constructor](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/machine/JcMachine.kt#L35C17-L35C36). + +Now, before every instruction gets symbolically executed, the symbolic engine will notify the observer about the next instruction. For instance, if the engine has reached the `throw` instruction, the attached observer will recieve the corresponding event: +```kotlin +fun onThrowStatement(simpleValueResolver: JcSimpleValueResolver, stmt: JcThrowInst, stepScope: JcStepScope) +``` + +Here, `stmt` represents the instruction which was proven to be reachable. If, for example, your analysis looks for reachability of `HorrificException`, you can look into type of `stmt.throwable`. + +The internal state of the analysis is stored into `stepScope`. In fact, your checker gets the representation of the whole program state in the branch that reaches `stmt`. You can query the arbitrary data stored into state or even modify it (allocate new information or modify the existing) using `stepScope.calcOnState`. For example, to allocate new memory fragment in program state, you can write +```kotlin +stepScope.calcOnState { memory.allocateConcreteRef() } +``` + +You can compute the validity of arbitrary logical predicates on state. Warning: this can cause queries to SMT solver, which can be time and memory demanding. To query the validity, you can write +```kotlin +stepScope.calcOnState { + clone().assert(your condition)?.let { + ... handling case when condition is satisfiable ... + } +} +``` + +To form and report warnings in SARIF format, use [built-in reporters](https://github.com/UnitTestBot/jacodb/blob/e61c2fa41533a2f0a39fad0beb220c3350987345/jacodb-analysis/src/main/kotlin/org/jacodb/analysis/sarif/DataClasses.kt#L137). + +You can browse the [existing checkers](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt) for more examples and details. + +# Using USVM to confirm SARIF reports + +In lots of cases, the exising static code analysers [report false alarms](#about-condition-sensitive-analysis). USVM has ability to confirm or reject the reported warnings. + +To run USVM in trace reproduction mode, configure one of the [analyses](https://github.com/UnitTestBot/usvm/blob/saloed/usvm-demo/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt) and [pass a set of traces](https://github.com/UnitTestBot/usvm/blob/a1e931e5e51f463ed4a33009cee1ffa01cd375bd/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt#L29) into [JcMachine](https://github.com/UnitTestBot/usvm/blob/main/usvm-jvm/src/main/kotlin/org/usvm/machine/JcMachine.kt). + +Also, this process can be customized by a [rich set of options](https://github.com/UnitTestBot/usvm/blob/main/usvm-util/src/main/kotlin/org/usvm/UMachineOptions.kt). + +# USVM for unit test generation + +USVM has ability to discover all possible behaviours of a program. This is a key feature used in white-box test generation engines. In future, USVM will be the default code analysis engine in [UnitTestBot for Java](https://github.com/UnitTestBot/utbotjava). + + +# Other languages support in USVM + +[USVM.Core](https://github.com/UnitTestBot/usvm/tree/main/usvm-core) is a framework which provides highly optimized primitives for symbolic execution: +* construction and manipulation with symbolic expressions (based on [KSMT](https://github.com/UnitTestBot/ksmt) platform); +* advanced modeling of [memory operations](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/memory); +* efficient [constraint managenent](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/constraints) and [constraint solving](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/solver); +* rich set of [search strategies](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/ps) in giant branching spaces; +* special support of [standard collections](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/collection); +* special support of [type system](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/types) constraints; +* collection and reporting of [statistics](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/statistics). + +Thus, USVM.Core implements common primitives used in programming languages. This makes much easier instantiating USVM for new programming language: in fact, to support a programming language, you only need to write its interpreter in terms of operations provided by USVM.Core. + +If you want to support a new language, please take a look at [sample language support](https://github.com/UnitTestBot/usvm/tree/main/usvm-sample-language) in USVM. \ No newline at end of file diff --git a/docs/assets/images/injection.png b/docs/assets/images/injection.png new file mode 100644 index 000000000..c8595db09 Binary files /dev/null and b/docs/assets/images/injection.png differ diff --git a/docs/assets/images/injection_fp.png b/docs/assets/images/injection_fp.png new file mode 100644 index 000000000..d28bee55b Binary files /dev/null and b/docs/assets/images/injection_fp.png differ