MCPNet is a gene regulatory network (GRN) reconstruction tool that identify long range indirect interactions based on a novel metric called MCP Score. MCP score uses maximum-capacity-path, a graph theoretical measure, to quantify the relative strengths of direct and indirect gene-gene interactions. MCPNet is implemented in C++ and is parallelized for multi-core and MPI multi-node environments. It is designed to reconstruct networks in unsupervised and semi-supervised manners.
At the moment the software is software is released as source code, so it is necessary to compile on your own system.
Below are the prerequisites. The example code are for ubuntu and debian distributions
- A modern c++ compiler with c++11 support. Supports Gnu g++, clang++ and Intel icpc.
sudo apt install build-essential
- MPI, for example openmpi or mvapich2
sudo apt install openmpi
- cmake, and optionally cmake GUI
sudo apt install cmake cmake-curses-gui
- HDF5
sudo apt install hdf5-helpers hdf5-tools libhdf5-dev
MCPNet can be downloaded directly from
https://github.com/AluruLab/MCPNet/releases/tag/paper2022
or the latest version can be accessed via git. Replace the curly braced directory with your actual directory.
git clone https://github.com/AluruLab/MCPNet.git {MCPNet_source_dir}
Next the submodules need to be initialied and updated.
cd {MCPNet_source_dir}
git submodule update --init --recursive
To build MCPNet, run the following from a linux command prompt to set up the build directory and configure the build. Replace the curly braced directory with your actual directory.
mkdir {MCPNet_build_dir}
cd {MCPNet_build_dir}
cmake {MCPNet_source_dir}
CMake allows customizing the compile options, including turning on or off support for OpenMP, MPI, HDF5, and SIMD instructions. We recommend that these be left as default, ON. The easiest way to change these settings is to use the CMake gui, ccmake.
ccmake {MCPNet_source_dir}
Alternatively, command line parameters may be used
cmake {MCPNet_source_dir} -DUSE_MPI=OFF -DUSE_SIMD=OFF
Once configured, then run
make
You can choose to install into the system bin directory with make install
but it is not necessary.
The compilation will generate a set of binaries in the {MCPNet_build_dir}/bin
directory. The current release includes the following binaries
-
mcpnet
: this is the primary executable that runs the MCPNet pipeline, including MI computation, MCP score calculation, and AUPR and AUROC metrics if groundtruth is supplied. If a list of transcription factors are supplied, then only TF to gene interactions are evaluated. -
eval_mcpnet
: this executable is used for evaluating the performance of the ensemble method when only partial truth is provided. This is suitable when full groundtruth is available, hence synthetic expression data. The groundtruth is divided into training and testing sets and the AUPR is evaluated for as many iterations as the user specifies. The distribution of the AUPR is captured and can be plotted. This binary has essentially the same parameters as mcpnet.
-
pearson
: computes the pearson correlation for a given gene expression profile input -
mi
: computes the mutual information for a given gene expression profile input -
mcp
: computes the MCP scores, either for fixed length paths (2, 3, 4) from MI matrix, or from two previously computed maximum path capacity matrices. -
dpi
: filters the MI matrix based on Data Processing Inequality as implemented in ARACNe and TINGe -
transform
: performs Stouffer and CLR transforms on input matrix -
au_pr_roc
: computes the AUC for PR and ROC curves based on a ground truth matrix.
-
convert
: utility to convert between file formats. Supported formats include HDF5, EXP, and CSV. -
combine
: perform element-wise basic math operations on two or more matrices of the same dimension. -
diagonal
: sets the diagonal elements of a matrix to the user specified value -
select
: select a subset of columns and rows from a matrix. -
threshold
: filters a matrix element-wise based on user specified thresholds.
Each binary has its own help commandline parameter that lists all available parameters. For example
mcpnet -h
or
mcpnet --help
Below are common parameters:
-
-i, --input
: input matrix file -
-o, --output
: output matrix file -
-t, --thread
: number of threads to use, per compute node -
-m, --method
: algorithm to use. This is specific for each binary, if present. Please use the-h
switch to see the exact algorithms supported.
The software is distributed with some example datasets in HDF5 format. Note that input and output format are specified via file extension, exp
, csv
, or h5
. Below are some example tests.
{MCPNet_build_dir}/bin/convert -i {MCPNet_source_dir}/data/r10c10.exp -o r10c10.csv
{MCPNet_build_dir}/bin/convert -i {MCPNet_source_dir}/data/r10c10.exp -o r10c10.h5
{MCPNet_build_dir}/bin/mi -i {MCPNet_source_dir}/data/r10c10.h5 -o r10c10.mi.h5 -m 1
{MCPNet_build_dir}/bin/pearson -i {MCPNet_source_dir}/data/r10c10.h5 -o r10c10.pearson.csv
{MCPNet_build_dir}/bin/mi -i {MCPNet_source_dir}/data/gnw2000.h5 -o gnw2000.mi.h5 -t 4 -m 2
{MCPNet_build_dir}/bin/mcp -i gnw2000.mi.h5 -o gnw2000.mcp4.h5 -m 3 -t 4
{MCPNet_build_dir}/bin/transform -i r10c10.mi.h5 -o r10c10.mi.stouffer.h5 -m 2 -t 4
{MCPNet_build_dir}/bin/au_pr_roc -i gnw2000.mcp4.h5 -x {MCPNet_source_dir}/data/gnw2000_truenet.csv
The mcpnet executable can accept multiple method parameters and will produce output for each one. The -o
or --output
parameter therefore takes a prefix rather than a filename.
{MCPNet_build_dir}/bin/mcpnet -i {MCPNet_source_dir}/data/gnw2000.h5 -o gnw2000 -m 1 2 3 -t 4 --mi-method 1 -x {MCPNet_source_dir}/data/gnw2000_truenet.csv