Merge pull request #58 from ftsrg/bitvectors

Basic bitvector support
ftsrg · Jul 29, 2020 · 75355cb · 75355cb
2 parents ccd1970 + ff73098
commit 75355cb
Show file tree

Hide file tree

Showing 65 changed files with 4,833 additions and 152 deletions.
diff --git a/build.gradle.kts b/build.gradle.kts
@@ -10,7 +10,7 @@ buildscript {
 
 allprojects {
     group = "hu.bme.mit.inf.theta"
-    version = "1.3.0"
+    version = "1.4.0"
 
     apply(from = rootDir.resolve("gradle/shared-with-buildSrc/mirrors.gradle.kts"))
 }

diff --git a/subprojects/cfa/README.md b/subprojects/cfa/README.md
@@ -39,16 +39,19 @@ Variables of the CFA can have the following types.
 - `int`: Mathematical, unbounded SMT integers.
 - `rat`: Rational numbers (implemented as SMT reals).
 - `[K] -> V`: SMT arrays (associative maps) from a given key type `K` to a value type `V`.
+- `bv[L]`, `bitvec[L]`, `ubv[L]`, `ubitvec[L]`, `sbv[L]`, `sbitvec[L]`: Signed or unsigned bitvector of given length `L`. _This is an experimental feature with currently limited algorithmic support. See the [details](doc/bitvectors.md) for more information._
 
 Expressions of the CFA include the following.
 - Identifiers (variables).
 - Literals, e.g., `true`, `false` (Bool), `0`, `123` (integer), `4 % 5` (rational).
   - Array literals can be given by listing the key-value pairs and the (mandatory) default element, e.g., `[0 <- 182, 1 <- 41, default <- 75]`. If there are no elements, the key type has to be given before the default element, e.g., `[<int>default <- 75]`.
+  - Bitvector literals can be given by stating the length, information about the signedness, and the exact value of the bitvector in binary, decimal or hexadecimal form. (E.g. `4'd5` is a 4-bit-long unsigned bitvector with the decimal value 5.) _This is an experimental feature with currently limited algorithmic support. See the [details](doc/bitvectors.md) for more information._
 - Comparison, e.g., `=`, `/=`, `<`, `>`, `<=`, `>=`.
 - Boolean operators, e.g., `and`, `or`, `xor`, `not`, `imply`, `iff`.
 - Arithmetic, e.g., `+`, `-`, `/`, `*`, `mod`, `rem`.
 - Conditional: `if . then . else .`
 - Array read (`a[i]`) and write (`a[i <- v]`).
+- Bitvector specific operators, e.g., `&`, `|`, `^`, `<<`, `>>`, `~`. _This is an experimental feature with currently limited algorithmic support. See the [details](doc/bitvectors.md) for more information._
 
 ### Textual representation (DSL)
 

diff --git a/subprojects/cfa/doc/bitvectors.md b/subprojects/cfa/doc/bitvectors.md
@@ -0,0 +1,121 @@
+# Bitvector support in Theta
+
+As of now, Theta has basic bitvector support for the CFA formalism.
+
+## Overview
+
+In Theta, every bitvector has a length, and is either signed or unsigned. It follows, that the range of every bitvector has a size of 2<sup>n</sup>. There are different operations that are available for bitvectors. It is important to note, that only operations between bitvectors with the same size and signedness are valid.
+
+Bitvectors have n bits. If the bitvector is unsigned then the values of the bits come from the binary representation of the underlying number. However, if the bitvector is signed then the values of the bits come from the two's complement representation of the underlying number.
+
+## Declaring bitvector variables
+
+To declare a bitvector variable, one has to specify the size and the signedness.
+
+```
+var x1 : bv[4] // Unsigned 4-bit-long bitvector
+var x2 : bitvec[5] // Unsigned 5-bit-long bitvector
+
+var u1 : ubv[4] // Unsigned 4-bit-long bitvector
+var u2 : ubitvec[6] // Unsigned 6-bit-long bitvector
+
+var s1 : sbv[4] // Signed 4-bit-long bitvector
+var s2 : sbitvec[7] // Signed 7-bit-long bitvector
+
+```
+
+## Bitvector literals
+
+There is a new literal, the bitvector literal that can be used to create bitvectors. Each literal defines the size and signedness of the literal. Moreover, eah literal can be entered using three different formats:
+
+- The bitwise precise binary form
+- The bitwise precise hexadecimal form
+- And the non-bitwise-precise, although user-friendly decimal form
+
+The two bitwise precise forms specify the value for all bits directly. These are useful for unsigned bitvectors, where there is no two's complement behavior (e.g. bitfields).
+
+```
+4'b0010 // Unsiged 4-bit-long bitvector literal (binary form)
+8'xAF // Unsiged 4-bit-long bitvector literal (hexadecimal form)
+
+4'bu0010 // Unsiged 4-bit-long bitvector literal (binary form)
+8'xuAF // Unsiged 4-bit-long bitvector literal (hexadecimal form)
+
+4'bs0010 // Signed 4-bit-long bitvector literal (binary form, not recommended)
+8'xsAF // Signed 4-bit-long bitvector literal (hexadecimal form, not recommended)
+```
+
+The non-bitwise-precise decimal form can be used to create bitvectors that are based on numbers. Thsi form is recommended for signed bitvectors, or unsigned bitvectors that are not bitfields.
+
+```
+4'd10 // Unsigned 4-bit-long bitvector literal (decimal form)
+
+4'du10 // Unsigned 4-bit-long bitvector literal (decimal form)
+
+4'ds5 // Signed 4-bit-long bitvector literal (decimal form, positive value)
+4'ds-5 // Signed 4-bit-long bitvector literal (decimal form, negative value)
+```
+
+## Operations on bitvectors
+
+The following operations are defined on bitvectors. As a general rule, every binary operation requires the bitvector on the left hand side and the bitvector on the right hand side to have the same size and signedness.
+
+The operators and their precedence are based on the [operators in C langauge](https://en.cppreference.com/w/c/language/operator_precedence).
+
+### Basic arithmetic operations
+
+These operations perform basic arithmetic operations on bitvectors. These operations require that the bitvector on the left hand side and the bitvector on the right hand side have the same size and signedness. These operations overflow!
+
+- **Addition:** Adds two bitvectors; `a + b`
+- **Subtraction:** Subtracts a from b; `a - b`
+- **Multiplication:** Multiplies two bitvectors; `a * b`
+- **Integer division:** Divides two bitvectors, and takes the integer part of the result; `a / b`
+- **Modulo:** Calculates (a mod b); `a mod b`
+- **Remainder:** Calculates the remainder of a / b; `a rem b`
+- **Negate:** Negates the value of a (only valid for signed bitvectors); `-a`
+
+### Bitvector specific operations
+
+These operations are specific to bitvectors only. These operations require that the bitvector on the left hand side and the bitvector on the right hand side have the same size and signedness. For the exact semantics check out the respective operators in C. These operations overflow!
+
+- **Bitwise and**: Ands two bitvectors; `a & b`
+- **Bitwise or**: Ors two bitvectors; `a | b`
+- **Bitwise xor**: XORs two bitvectors; `a ^ b`
+- **Bitwise shift left**: Shifts a to left with b; `a << b`
+- **Bitwise shift right**: Shifts a to right with b; `a >> b`
+- **Bitwise not:** Negates all the bits in bitvectors; `~a`
+
+### Relational operations
+
+These operations encode relational operations between bitvectors. These operations require that the bitvector on the left hand side and the bitvector on the right hand side have the same size and signedness.
+
+- **Equality**: Checks if a equals to b; `a = b`
+- **Non-equality**: Checks if a does not equal to b; `a /= b`
+- **Greater than or equals to**: Checks if a is greater than or equals to b; `a >= b`
+- **Greater than**: Checks if a is greater than b; `a > b`
+- **Less than or equals to**: Checks if a is less than or equals to b; `a <= b`
+- **Less than**: Checks if a is less than b; `a < b`
+
+### Conversion "operations"
+
+Bitvectors can be converted to integers, and vice-versa. However, since integers can have an arbitrily huge value, should that value be impossible to be represented in the bitvectors range, an exception will be thrown. So procede with caution!
+
+There is no explicit conversion operator, however, the assignment statement converts between the two types.
+
+```
+var bv1 : bv[4]
+var i1 : int
+
+// ...
+
+L0 -> L1 {
+    bv1 := i1
+    i1 := bv1
+}
+```
+
+## Algorithmic support for verification with bitvectors
+
+As of now, bitvectors are only partially supported due to the fact, that the underlying SMT solver, Z3 does not support interpolation when bitvectors are involved.
+
+As a result, CEGAR with predicate abstraction is not supported at all. However, CEGAR with explicit value abstraction, and with the UNSAT core refinement strategy is working preoperty, as it does not rely on the interpolation capabilities of the SMT solver.
diff --git a/subprojects/cfa/src/main/antlr/CfaDsl.g4 b/subprojects/cfa/src/main/antlr/CfaDsl.g4
@@ -80,6 +80,7 @@ type:	boolType
 	|	ratType
 	|	funcType
 	|	arrayType
+	|	bvType
 	;
 
 typeList
@@ -106,6 +107,19 @@ arrayType
 	:	LBRACK indexType=type RBRACK RARROW elemType=type
 	;
 
+bvType
+    :   ubvType
+    |   sbvType
+    ;
+
+ubvType
+    :   UBVTYPE LBRACK size=INT RBRACK
+    ;
+
+sbvType
+    :   SBVTYPE LBRACK size=INT RBRACK
+    ;
+
 BOOLTYPE
 	:	'bool'
 	;
@@ -118,6 +132,18 @@ RATTYPE
 	:	'rat'
 	;
 
+UBVTYPE
+    :   'bv'
+    |   'bitvec'
+    |   'ubv'
+    |   'ubitvec'
+    ;
+
+SBVTYPE
+    :   'sbv'
+    |   'sbitvec'
+    ;
+
 // E X P R E S S I O N S
 
 expr:	funcLitExpr
@@ -181,9 +207,25 @@ equalityExpr
 	;
 
 relationExpr
-	:	leftOp=additiveExpr (oper=(LT | LEQ | GT | GEQ) rightOp=additiveExpr)?
+	:	leftOp=bitwiseOrExpr (oper=(LT | LEQ | GT | GEQ) rightOp=bitwiseOrExpr)?
 	;
 
+bitwiseOrExpr
+    :   leftOp=bitwiseXorExpr (oper=BITWISE_OR rightOp=bitwiseXorExpr)?
+    ;
+
+bitwiseXorExpr
+    :   leftOp=bitwiseAndExpr (oper=BITWISE_XOR rightOp=bitwiseAndExpr)?
+    ;
+
+bitwiseAndExpr
+    :   leftOp=bitwiseShiftExpr (oper=BITWISE_AND rightOp=bitwiseShiftExpr)?
+    ;
+
+bitwiseShiftExpr
+    :   leftOp=additiveExpr (oper=(BITWISE_SHIFT_LEFT | BITWISE_SHIFT_RIGHT) rightOp=additiveExpr)?
+    ;
+
 additiveExpr
 	:	ops+=multiplicativeExpr (opers+=(PLUS | MINUS) ops+=multiplicativeExpr)*
 	;
@@ -193,10 +235,15 @@ multiplicativeExpr
 	;
 
 negExpr
-	:	accessorExpr
+	:	bitwiseNotExpr
 	|	MINUS op=negExpr
 	;
 
+bitwiseNotExpr
+    :   accessorExpr
+    |   BITWISE_NOT op=bitwiseNotExpr
+    ;
+
 accessorExpr
 	:	op=primaryExpr (accesses+=access)*
 	;
@@ -230,6 +277,7 @@ primaryExpr
 	|	intLitExpr
 	|	ratLitExpr
 	|	arrLitExpr
+	|	bvLitExpr
 	|	idExpr
 	|	parenExpr
 	;
@@ -255,6 +303,10 @@ arrLitExpr
     |   LBRACK LT indexType=type GT DEFAULT LARROW elseExpr=expr RBRACK
     ;
 
+bvLitExpr
+    :   bv=BV
+    ;
+
 idExpr
 	:	id=ID
 	;
@@ -342,6 +394,30 @@ PERCENT
 	:	'%'
 	;
 
+BITWISE_OR
+    :   '|'
+    ;
+
+BITWISE_XOR
+    :   '^'
+    ;
+
+BITWISE_AND
+    :   '&'
+    ;
+
+BITWISE_SHIFT_LEFT
+    :   LT LT
+    ;
+
+BITWISE_SHIFT_RIGHT
+    :   GT GT
+    ;
+
+BITWISE_NOT
+    :   '~'
+    ;
+
 TRUE:	'true'
 	;
 
@@ -401,6 +477,11 @@ RETURN
 
 // B A S I C   T O K E N S
 
+BV  :   NAT '\'b' ('s'|'u')? [0-1]+
+    |   NAT '\'d' ('s'|'u')? INT
+    |   NAT '\'x' ('s'|'u')? [0-9a-fA-F]+
+    ;
+
 INT	:	SIGN? NAT
 	;