How to run this GPU-DIZK repo:
#follow the `setup.sh` to install dependencies. please read the script.
sh setup.sh
mvn compile
#please set some variables like the installed GMP library path, NVCC path, JAVA path and GPU compute architecture code before running `jni_compile.sh`. The NVCC compilation could be a long time.
sh jni_compile.sh
#test proof generation correctness
mvn test -Dtest=zk_proof_systems.zkSNARK.SerialzkSNARKTest
#benchmark serial zkSNARK performance(FixMSM, VarMSM, FFT, end to end)
sh serialzkSNARKProfiler.sh
#benchmark distributed zkSNARK performance. remember to setup your Spark cluster first.
sh distributedzkSNARKProfiler.sh
DIZK (pronounced /'dizək/) is a Java library for distributed zero knowledge proof systems. The library implements distributed polynomial evaluation/interpolation, computation of Lagrange polynomials, and multi-scalar multiplication. Using these scalable arithmetic subroutines, the library provides a distributed zkSNARK proof system that enables verifiable computations of up to billions of logical gates, far exceeding the scale of previous state-of-the-art solutions.
The library is developed by SCIPR Lab and contributors (see AUTHORS file) and is released under the MIT License (see LICENSE file).
The library is developed as part of a paper called "DIZK: A Distributed Zero Knowledge Proof System".
WARNING: This is an academic proof-of-concept prototype. This implementation is not ready for production use. It does not yet contain all the features, careful code review, tests, and integration that are needed for a deployment!
The directory structure is as follows:
- src: Java directory for source code and unit tests
- main/java: Java source code, containing the following modules:
- algebra: fields, groups, elliptic curves, FFT, multi-scalar multiplication
- bace: batch arithmetic circuit evaluation
- common: standard arithmetic and Spark computation utilities
- configuration: configuration settings for the Spark cluster
- profiler: profiling infrastructure for zero-knowledge proof systems
- reductions: reductions between languages (used internally)
- relations: interfaces for expressing statement (relations between instances and witnesses) as various NP-complete languages
- zk_proof_systems: serial and distributed implementations of zero-knowledge proof systems
- test/java: Java unit tests for the provided modules and infrastructure
- main/java: Java source code, containing the following modules:
This library implements a distributed zero knowledge proof system, enabling scalably proving (and verifying) the integrity of computations, in zero knowledge.
A prover who knows the witness for an NP statement (i.e., a satisfying input/assignment) can produce a short proof attesting to the truth of the NP statement. This proof can then be verified by anyone, and offers the following properties.
- Zero knowledge - the verifier learns nothing from the proof besides the truth of the statement.
- Succinctness - the proof is small in size and cheap to verify.
- Non-interactivity - the proof does not require back-and-forth interaction between the prover and the verifier.
- Soundness - the proof is computationally sound (such a proof is called an argument).
- Proof of knowledge - the proof attests not just that the NP statement is true, but also that the prover knows why.
These properties comprise a zkSNARK, which stands for Zero-Knowledge Succinct Non-interactive ARgument of Knowledge. For formal definitions and theoretical discussions about these, see [BCCT12] [BCIOP13] and the references therein.
DIZK provides Java-based implementations using Apache Spark [Apa17] for:
- Proof systems
- A serial and distributed preprocessing zkSNARK for R1CS (Rank-1 Constraint Systems), an NP-complete language that resembles arithmetic circuit satisfiability. The zkSNARK is the protocol in [Gro16].
- A distributed Merlin-Arthur proof system for evaluating arithmetic circuits on batches of inputs; see [Wil16].
- Scalable arithmetic
- A serial and distributed radix-2 fast Fourier transform (FFT); see [Sze11].
- A serial and distributed multi-scalar multiplication (MSM); see [BGMW93] [Pip76] [Pip80].
- A serial and distributed Lagrange interpolation (Lag); see [BT04].
- Applications using the above zkSNARK for
- Authenticity of photos on three transformations (crop, rotation, blur); see [NT16].
- Integrity of machine learning models with support for linear regression and covariance matrices; see [Bis06] [Can69] [LRF97] [vW97].
The library has the following dependencies:
- Java SE 8+
- Apache Maven
- Fetched from
pom.xml
via Maven: - Fetched via Git submodules:
This library uses Apache Spark, an open-source cluster-computing framework that natively supports Java, Scala, and Python. Among these, we found Java to fit our goals because we could leverage its rich features for object-oriented programming and we could control execution in a (relatively) fine-grained way.
While other libraries for zero knowledge proof systems are written in low-level languages (e.g., libsnark is written in C++ and bellman in Rust), harnessing the speed of such languages in our setting is not straightforward. For example, we evaluated the possibility of interfacing with C (using native binding approaches like JNI and JNA), and concluded that the cost of memory management and process inferfacing resulted in a slower performance than from purely native Java execution.
Start by cloning this repository and entering the repository working directory:
git clone https://github.com/scipr-lab/dizk.git
cd dizk
Next, fetch the dependency modules:
git submodule init && git submodule update
Finally, compile the source code:
mvn compile
cd your_dizk_project_directory
docker build -t dizk-container .
docker run -it dizk-container bash
This library comes with unit tests for each of the provided modules. Run the tests with:
mvn test
Using Amazon EC2, the profiler benchmarks the performance of serial and distributed zero-knowledge proof systems, as well as its underlying primitives.
The profiler uses spark-ec2
to manage the cluster compute environment and a set of provided scripts for launch, profiling, and shutdown.
To manage the cluster compute environment, DIZK uses [email protected]
.
spark-ec2
is a tool to launch, maintain, and terminate Apache Spark clusters on Amazon EC2.
To setup spark-ec2
, run the following commands:
git clone https://github.com/amplab/spark-ec2.git
cd spark-ec2
git checkout branch-2.0
pwd
Remember where the directory for spark-ec2
is located, as this will need to be provided as an environment variable for the scripts as part of the next step.
To begin, set the environment variables required to initialize the profiler in init.sh. The profiling infrastructure will require access to an AWS account access key and secret key, which can be created with the instructions provided by AWS.
export AWS_ACCESS_KEY_ID={Insert your AWS account access key}
export AWS_SECRET_ACCESS_KEY={Insert your AWS account secret key}
export AWS_KEYPAIR_NAME="{Insert your AWS keypair name, e.g. spark-ec2-oregon}"
export AWS_KEYPAIR_PATH="{Insert the path to your AWS keypair .pem file, e.g. /Users/johndoe/Downloads/spark-ec2-oregon.pem}"
export AWS_REGION_ID={Insert your AWS cluster region choice, e.g. us-west-2}
export AWS_CLUSTER_NAME={Insert your AWS cluster name, e.g. spark-ec2}
export SPOT_PRICE={Insert your spot price for summoning an EC2 instance, e.g. 0.1}
export SLAVES_COUNT={Insert the number of EC2 instances to summon for the cluster, e.g. 2}
export INSTANCE_TYPE={Insert the instance type you would like to summon, e.g. r3.large}
export DIZK_REPO_PATH="{Insert the path to your local DIZK repository, e.g. /Users/johndoe/dizk}"
export SPARK_EC2_PATH="{Insert the path to your local spark-ec2 repository, e.g. /Users/johndoe/dizk/depends/spark-ec2}"
Next, start the profiler by running:
./launch.sh
The launch script uses spark-ec2
and the environment variables to setup the initial cluster environment.
This process takes around 20-30 minutes depending on the choice of cluster configuration.
After the launch is complete, upload the DIZK JAR file to the master node and SSH into the cluster with the following command:
./upload_and_login.sh
Once you have successfully logged in to the cluster, navigate to the uploaded scripts
folder and setup the initial cluster environment.
cd ../scripts
./setup_environment.sh
This creates a logging directory for Spark events and installs requisite dependencies, such as Java 8.
Lastly, with the cluster environment fully setup, set the desired parameters for benchmarking in profile.sh and run the following command to begin profiling:
./profile.sh
We evaluate the distributed implementation of the zkSNARK setup and prover. Below we use instance size to denote the number of constraints in an R1CS instance.
We measure the largest instance size (as a power of 2) that is supported by:
- the serial implementation of PGHR’s protocol in libsnark
- the serial implementation of Groth’s protocol in libsnark
- the distributed implementation of Groth's protocol in DIZK
We see that using more executors allows us to support larger instance sizes, in particular supporting billions of constraints with sufficiently many executors. Instances of this size are much larger than what was previously possible via serial techniques.
We benchmark the running time of the setup and the prover on an increasing number of constraints and with an increasing number of executors. Note that we do not need to evaluate the zkSNARK verifier as it is a simple and fast algorithm that can be run even on a smartphone.
Our benchmarks of the setup and the prover show us that:
-
For a given number of executors, running times increase nearly linearly as expected, demonstrating scalability over a wide range of instance sizes.
-
For a given instance size, running times decrease nearly linearly as expected, demonstrating parallelization over a wide range of number of executors.
[Apa17] Apache Spark, Apache Spark, 2017
[Bis06] Pattern recognition and machine learning, Christopher M. Bishop, Book, 2006
[BCCT12] From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again, Nir Bitansky, Ran Canetti, Alessandro Chiesa, Eran Tromer, Innovations in Theoretical Computer Science (ITCS), 2012
[BCIOP13] Succinct non-interactive arguments via linear interactive proofs, Nir Bitansky, Alessandro Chiesa, Yuval Ishai, Rafail Ostrovsky, Omer Paneth, Theory of Cryptography Conference (TCC), 2013
[BGMW93] Fast exponentiation with precomputation, Ernest F. Brickell, Daniel M. Gordon, Kevin S. McCurley, and David B. Wilson, International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), 1992
[BT04] Barycentric Lagrange interpolation, Jean-Paul Berrut and Lloyd N. Trefethen, SIAM Review, 2004
[Can69] A cellular computer to implement the Kalman filter algorithm, Lynn E Cannon, Doctoral Dissertation, 1969
[Gro16] On the size of pairing-based non-interactive arguments, Jens Groth, International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), 2016
[LRF97] Generalized cannon’s algorithm for parallel matrix multiplication, Hyuk-Jae Lee, James P. Robertson, and Jose A. B. Fortes, International Conference on Supercomputing, 1997
[NT16] Photoproof: Cryptographic image authentication for any set of permissible transformations, Assa Naveh and Eran Tromer, IEEE Symposium on Security and Privacy, 2016
[Pip76] On the evaluation of powers and related problems, Nicholas Pippenger, Symposium on Foundations of Computer Science (FOCS), 1976
[Pip80] On the evaluation of powers and monomials, Nicholas Pippenger, SIAM Journal on Computing, 1980
[Sze11] Schönhage-Strassen algorithm with MapReduce for multiplying terabit integers, Tsz-Wo Sze, International Workshop on Symbolic-Numeric Computation, 2011
[vW97] SUMMA: scalable universal matrix multiplication algorithm, Robert A. van de Geijn and Jerrell Watts, Technical Report, 1997
[Wil16] Strong ETH breaks with Merlin and Arthur: short non-interactive proofs of batch evaluation, Ryan Williams, Conference on Computational Complexity, 2016
This work was supported by Intel/NSF CPS-Security grants, the UC Berkeley Center for Long-Term Cybersecurity, and gifts to the RISELab from Amazon, Ant Financial, CapitalOne, Ericsson, GE, Google, Huawei, IBM, Intel, Microsoft, and VMware. The authors thank Amazon for donating compute credits to RISELab, which were extensively used in this project.