We describe the functionality of each python source file and each folder in the followings:
-
analysis
folder contains the definition fo abstracted values and abstract interpretations for operations in computation graphs.-
abstract_interpretation.py
contains the class type of abstracted values that we use for our tensor abstraction and interval abstraction with affine relations. -
inference.py
contains the abstract interpretations for operations in computation graphs.
For more information please see Analysis.
-
-
parse
folder contains the parsing process of the Protocol Buffer format to the computation graph, the process of static dataflow analysis, the parsing process of values, and the user-specified weights/inputs ranges.-
parse_graph.py
contains the parsing process of the Protocol Buffer format to the computation graph and the process of static dataflow analysis. -
parse_format_text.py
contains the parsing process of constant values, variables, and placeholders. -
specified_ranges.py
contains the reusable weights/inputs ranges specified by users.
For more information please see Parse.
-
-
analysis_main.py
is the entry of DEBAR. It takes one argument that is the target Protocol Buffer file containing the computation graph under verification. It also takes one option argument denoting whether to specify the range of the weights and the range of the inputs. Please see Running DEBAR Section in README for how the usage of DEBAR. The workflow ofanalysis_main.py
:- First, it calls
parse_graph.py
to obtain the computation graph. - Second, it scans the list of unsafe operations and calls the dataflow analysis in
parse_graph.py
to get the range of the input to the unsafe operations. - Third, it checks whether the range of the input to the unsafe operation intersects with its danger zone.
- If safe, then the unsafe operation is verified to be safe.
- Otherwise, go to the next step.
- Fourth, if the range of input to the unsafe operation cannot be proved as safe, we will further split the ranges of some nodes using constant predicates such as 0. If all the splits of any node can prove the range of input to the unsafe operation does not intersect with its danger zone, then the operation is safe. The motivation and the details of predicate splitting can be found at Predicate Splitting Section.
- If safe, then the unsafe operation is verified to be safe.
- Otherwise, DEBAR generates a warning for the unsafe operation.
- First, it calls
-
main.py
is the entry of reproducing the evaluation results reported in the paper. It takes one argument that is the path to the downloaded datasets. Please see Reproduce Evaluation in our Paper Section in README for how to reproduce the evaluation results in our paper. -
solver.py
-
Solver
is a legacy class that was used in z3-solver aided dataflow analysis. We are considering removing it because it is never used. -
Range
is a data structure for interval abstraction.left
andright
denote the lower and upper bound of the abstracted value. -
check_range_const(range_const)
checks whether aRange
objectrange_const
has const lower and upper bound. It is also a legacy function. Because we do not use z3-solver any more, the function should always return true, also considering removing it. -
meet(range, range_const)
checks whether the interval ofrange
intersects with the interval ofrange_const
. In other words,meet
returns true iffrange
$\cap$ range_const
$= \varnothing$ .meet
will be called inanalysis_main.py
to check whether the interval bound of unsafe operations computed by static analysis intersects with their danger zone. -
Array
is the data structure supporting the tensor partitioning. It mainly contains:-
index_slices
: a list stores the partitioning positions of each dimension. If the tensor has dimension$d$ thenindex_slices
contains$d$ tuples, each of which is the set of partitioning positions. If the$i$ -th dimension is partitioned to $[0,p^{(i)}_0), [p^{(i)}_0,p^{(i)}1), \dots, [p^{(i)}{m-1},p^{(i)}_m)$, where$p^{(i)}_m$ is equal to the size of$i$ -th dimension, then the$i$ -th tuple ofindex_slices
will be$(p^{(i)}_0,p^{(i)}_2,\ldots,p^{(i)}_m)$ .For example, a two-dimension tensor (matrix) A has the shape
$3\times 4$ and it is partitioned into 6 partitions$[0,1)\times [0,2)$ ,$[0,1)\times [2,3)$ ,$[0,1)\times [3,4)$ ,$[1,3)\times [0,2)$ ,$[1,3)\times [2,3)$ ,$[1,3)\times [3,4)$ . Then theindex_slices
will be[(1, 3), (2, 3, 4)]
. -
block_to_symbol
: a map maps from each partition to aLinear
object, which maintains the linear affine relation. Each partition is defined by the Cartesian product of$d$ tuples inindex_slices
, called partitioning positions.Taking the above example, the keys of the map
block_to_symbol
will be a set of 6$2$ -tuples:$(1,2),(1,3),(1,4),(3,2),(3,3),(3,4)$ , each of which denotes the ending points of partitions in all dimensions. -
join_index_slices(a, b)
aligns two sets of partitioning positionsa
andb
. Botha
andb
are in the form ofArray.index_slices
.For example, suppose
a = [(1,3), (2,3,4)]
andb=[(3), (1,4)]
, then the aligned partitioning positionsc
will be[(1,3), (1,2,3,4)]
. It can be seen thatc
has a finer granularity of partitioning thana
andb
in this case. -
get_corresponding_keys(self, index_slices)
gets the correspondingLinear
objects according toindex_slices
. Notice thatindex_slices
may have a finer granularity of partitioning thanself.index_slices
, so theLinear
object (as well as the variables stored in theLinear
object) may need to be further partitioned.For example, suppose
self.index_slices=[(1, 3), (2, 3, 4)]
andindex_slices = [(1, 3), (1, 2, 3, 4)]
, then the partition$[1,3)\times[0,1)$ corresponds to the partition$[0,2)\times[0,1)$ ofself.block_to_symbol[(3,2)]
.
-
-
Linear
is the data structure supporting the linear affine relation. For example, considering the following affine relation: $$ 3x-relu(x)+4y-z+5=0. $$ EachLinear
object has a main variable, because eachLinear
object is stored in theblock_to_symbol
field in anArray
object, and theArray
object is the abstracted value of the main variable. For example,$z$ may be the main variable in the above affine relation, then what is stored in theLinear
object is the following affine expression: $$ 3x-relu(x)+4y+5, $$ whose semantics is$z=3x-relu(x)+4y+5$ , where$z$ is omitted since it can be inferred since$z$ is the main variable.In order to store such affine expression,
Linear
uses:-
value
: a map maps from variables to their factors. Taking the above example,value[x] = 3
,value[relu(x)] = -1
,value[y] = 4
, andvalue[CONST] = 5
. And each variable is a partition of a tensor which is the output of one operation. The variable is defined as a pair of(name, indexes)
, where thename
is the name of the operation, andindexes
is the partitioning positions. -
map_to_index
: a map maintains an index mapping. The purpose of maintaining this index mapping is that additional dimensions may be added after operations likepack
, the dimensions may be deleted (these dimensions are all equal to 1 and do not change the size of the partition) after operations likeunpack
, the dimensions may be permuted after operations liketranspose
. Considering the following code:z = transpose(x)
, where
z
is a matrix$3\times 4$ andx
is a matrix$4 \times 3$ . Suppose there is only one (whole) partition ofz
, then the tensor partitionArray
ofz
containsindex_slices=[(3,), (4,)]
andblock_to_symbol=[(3,4)]
maps to the linear affine expressionx
, where the partition positions ofx
are[(4,), (3,)]
. Notice that the 0-th dimension ofz
is the 1-th dimension ofx
and the 1-th dimension ofz
is the 0-th dimension ofx
.The semantics of
map_to_index
is thatz[t]
corresponds tox[map_to_index[t]]
. -
__add__(self, other)
,__sub__(self, other)
: adds/subs between two affine expressions and returns a newLinear
object. -
neg(self)
: calculates the negation of the affine expression. -
choose(self, start_ind)
: further partitions the variables inside theLinear
object and returns a new partitionedLinear
object. Recall inArray.get_corresponding_keys
, we may further partition theLinear
object as well as the variables stored in theLinear
object. -
add_pack_ind(self, pack_ind)
: adds an axis at thepack_ind
-th dimension and returns a new packedLinear
object. -
remove_unpack_axis(self, axis)
: removes an axis at theaxis
-th dimension and returns a new unpackedLinear
object. -
transpose(self, perm)
: transposes themap_to_index
according to the permutationperm
and returns a new transposedLinear
object. -
relu(self)
: calculates the$relu$ of the affine expression and returns a newLinear
object. It only supports calculating the relu of a singleton affine expression that only contains one variable or one constant value, e.g.,x
,-x
,relu(x)
,-relu(x)
, and constant valuec
. The following axioms are used to calculate$relu$ : $$ relu(x)=relu(x)\ relu(-x)=-x+relu(x)\ relu(relu(x))=relu(x)\ relu(-relu(x))=0\ relu(c)=max(c,0). $$
-
-
meet_relation_variable(rv, range_const)
is never used, also considering removing it.
-
-
utils.py
-
OVERFLOW_LIMIT
,UNDERFLOW_LIMIT
,OVERFLOW_D
, andUNDERFLOW_D
specify the overflow and underflow limit intf.float32
. -
resolve_type(y)
convertsy
from data types innumpy
to python primitive data types. -
shape_from_proto(shape)
parses the tensor shape from protocol buffer formatshape
into a python list.
-
The workflow of analysis_main.py
has been described previously. We further describe the motivation and the details of predicate splitting.
Considering the following expression: $$ y = e^{-relu(x)} + e^{x-relu(x)} $$
The range of
The range of
When we find out a variable
In this case, if we split the range of
-
$x=[-50,0]$ : we are able to calculate the range of$-relu(x)$ is$[0,0]$ and the range of$x-relu(x)$ is$[-50,0]$ . Then the range of$e^{-relu(x)}$ is$[1,1]$ and the range of$e^{x-relu(x)}$ is$[0,1]$ , leading the range of$y$ to be$[1,2]$ . -
$x=[0,40]$ : we are able to calculate the range of$-relu(x)$ is$[-40,0]$ and the range of$x-relu(x)$ is$[0,0]$ . Then the range of$e^{-relu(x)}$ is$[0,1]$ and the range of$e^{x-relu(x)}$ is$[1,1]$ , leading the range of$y$ to be$[1,2]$ .
After merge the two ranges of
For each unsafe operation under verification, analysis_main.py
first calls graph.forward_analysis
in parse_grahp.py
to get the dataflow analysis results of the computation graph. The analysis results are stored in graph.node_output
, and the return values of graph.forward_analysis
contain the ranges of node needed to be split range_to_split
.
Function is_valid_by_split
checks whether the unsafe operation's input is valid. It first checks whether the input ranges of the unsafe operation is valid without without predicate splitting, if false, it tries to split each node in range_to_split
and reevaluate the dataflow analysis in an incremental manner by calling graph.reevaluate
. If the merged result of any split node is valid, then the input ranges of the unsafe operation is proved to be valid.
Theoretically, the more splits we try the preciser results we will get. In the implementation, we only try to split the range into two splits and the constant predicate is always 0.