Formal verification allows users to verify properties of program execution - for instance, they can verify properties of a function's output, or that a program does not crash. This is useful for a variety of purposes - from formally verifying the behaviour of cyptography libraries to ensuring that applications used for sensitive medical application do not crash.
NLPSpecifier is an application that allows you to produce formal specifications from software documentation. These formal specifications can then be used by Prusti to verify the behaviour of your source code. In particular, NLPSpecifier can parse documentation embedded within Rust documentation pages (produced through cargo doc
, or downloaded from rustup
) and Rust source code.
Python 3.6+ and Rust should already be installed for the parsing steps. To perform verification of output specifcations, Prusti should also be installed.
Dependencies for individual components of the system are specified below, or are otherwise manually installed using provided setup.sh
scripts.
Note that all .sh files and commands provided are specific to Linux.
Also note that to avoid contaminating your Python installation, it is best to use venv
.
The main functionality of this application is available through src/doc_parser/. To build this executable for your system, use
cd ./src/doc_parser/ && cargo build --release ; cd ..
Once installation is complete, this project can be used through doc_parser
. To run the program, use ./doc_parser
, or cargo run --release
at the
appropriate locations. Run the following command to see a list of all possible commands.
foo@bar:~$ ./doc_parser -h
doc_parser
USAGE:
doc_parser <SUBCOMMAND>
FLAGS:
-h, --help Print help information
-V, --version Print version information
SUBCOMMANDS:
end-to-end Demonstrates entire pipeline from start to end on provided file, writing
output to terminal
help Print this message or the help of the given subcommand(s)
render Visualization of various components in the system's pipeline
specify Creates specifications for a variety of sources
To see more specific help, do the following:
foo@bar:~$ ./doc_parser end-to-end --help
doc_parser-end-to-end
Demonstrates entire pipeline from start to end on provided file, writing output to terminal
USAGE:
doc_parser end-to-end [OPTIONS]
FLAGS:
-h, --help Print help information
-V, --version Print version information
OPTIONS:
-p, --path <PATH> Source file to specify [default: ../../data/test3.rs]
The NLP parsing code tokenizes, assigns parts of speech tags to tokens, and detects named entities using spaCy. To set up the dependencies, run src/setup.sh:
cd ./src/ && sudo chmod +x ./setup.sh && ./setup.sh && cd .
If you also need to use SRL and NER capabilities, run jml_nlp/setup.sh:
cd ./jmlnlp/ && sudo chmod +x ./setup.sh && ./setup.sh && cd .
Then, to launch the base services, which include a web interface and a REST API used in doc_parser for NLP:
sudo docker-compose --env-file nlp.env up -d --build
And using --profile jml_nlp
to also launch the SRL and NER services:
sudo docker-compose --env-file --profile jml_nlp nlp.env up -d --build
Alternatively, it is possible to run the base services directly on your computer, without using docker.
The python implementation is available at https://github.com/philippeitis/nlp_specifier/tree/62d4d51a30c173f65daaf631b7acca0ffbf572a3
This link provides a useful reference for the POS tags generated by spaCy: http://erwinkomen.ruhosting.nl/eng/2014_Longdale-Labels.htm
To use NER and SRL analysis for documentation, Docker and Docker Compose must be installed. Additionally, downloading the relevant models requires installing Git and Git LFS. All other dependencies for this are set up using jml_nlp/setup.sh.
cd ./jml_nlp/ && sudo chmod +x ./setup.sh && ./setup.sh && cd .
After running this script, the SRL service will be available at 127.0.0.8:701, and the NER service will be available at 127.0.0.8:702. src/nlp/ner.py provides functions for annotating text using these services. The Tokenizer class in src/nlp/tokenizer.py transforms these annotations to a format that can be rendered by spaCy's displaCy tool.
The NER and SRL models are sourced from Combining formal and machine learning techniques for the generation of JML specifications
.
-
Build server interface for NLP
-
Tree formatting
-
Convert codegen to Rust build.rs (to ensure that we don't introduce inconsistencies when parsing grammar)
-
Build synonym sets for this problem domain
-
Train model which correctly tags VB cases (currently tagged as "NN")
-
Build pseudo-compiler to iteratively resolve specification
-
Detect fn purity
-
Build tool to clean up resulting specifications
-
Detect vacuously true specifications
-
Quality of specification?
-
Desugar for loops?
- Straight up search using current methods, and plug in symbols
- No slotting at all (but methods / struct fields should be attached as .x, while lone functions are wrapping)
- Replaces first N matches of a pattern with another string
::ffff:a.b.c.d
becomesa.b.c.d
- The elements are passed in opposite order from their order in the slice , so if
same_bucket(a, b)
returnstrue
,a
is moved at the end of the slice - Returns the capacity this
OsString
can hold without reallocating - Note that the capacity of
self
does not change - Returns
true
if the associatedOnce
was poisoned prior to the invocation of the closure passed toOnce::call_once_force()
replace_with
does not need to be the same length asrange
- This method is equivalent to
CString::new
except that no runtime assertion is made thatv
contains no 0 bytes , and it requires an actual byte vector , not anything that can be converted to one with Into - See
send
for notes about guarantees of whether the receiver has received the data or not if this function is successful
- Find matches which do not include the sentence (this approach does not handle interior skipwords)
- Preprocess sentence before tree - detect and remove common fillers (already done in part with SPACE)
- Process fragments, have grammar for interior fragments (eg. "otherwise x is true")
- Allow chaining (requires Context)
- Add error messages when parsing (eg. "You typed '`self.capacity is unchanged', missing 'must' or 'will' to specify that this is a pre/post condition")
Allow using "but"