A deep-learning based multi-omics bulk sequencing data integration suite with a focus on (pre-)clinical endpoint prediction. The package includes multiple types of deep learning architectures such as simple fully connected networks, supervised variational autoencoders; different options of data layer fusion, and automates feature selection and hyperparameter optimisation. The tools are continuosly benchmarked on publicly available datasets mostly related to the study of cancer. Some of the applications of the methods we develop are drug response modeling in cancer patients or preclinical models (such as cell lines and patient-derived xenografts), cancer subtype prediction, or any other clinically relevant outcome prediction that can be formulated as a regression or classification problem.
A detailed documentation of classes and functions in this repository can be found here.
For the latest benchmark results see: https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis-benchmark-datasets/dashboard.html
The code for the benchmarking pipeline is at: https://github.com/BIMSBbioinfo/flexynesis-benchmarks
# install
git clone https://github.com/BIMSBbioinfo/flexynesis.git
cd flexynesis
conda create --name flexynesis --file spec-file.txt
conda activate flexynesis
pip install -e .
# test the installation
curl -L -o dataset1.tgz https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis-benchmark-datasets/dataset1.tgz
tar -xzvf dataset1.tgz
flexynesis --data_path dataset1 --model_class DirectPred --target_variables Erlotinib --fusion_type early --hpo_iter 1 --features_min 50 --features_top_percentile 5 --log_transform False --data_types gex,cnv --outdir . --prefix erlotinib_direct --early_stop_patience 3 --use_loss_weighting False --evaluate_baseline_performance False
InputFolder/
| -- train
| |-- omics1.csv
| |-- omics2.csv
| |-- ...
| |-- clin.csv
| -- test
| |-- omics1.csv
| |-- omics2.csv
| |-- ...
| |-- clin.csv
clin.csv
contains the sample metadata. The first column contains unique sample identifiers.
The other columns contain sample-associated clinical variables.
NA
values are allowed in the clinical variables.
v1,v2
s1,a,b
s2,c,d
s3,e,f
The first column of the feature tables must be unique feature identifiers (e.g. gene names).
The column names must be sample identifiers that should overlap with those in the clin.csv
.
They don't have to be completely identical or in the same order. Samples from the clin.csv
that are not represented
in the omics table will be dropped.
s1,s2,s3
g1,0,1,2
g2,3,3,5
g3,2,3,4
The corresponding omics files in train/test splits must contain overlapping feature names (they don't
have to be identical or in the same order).
The clin.csv
files in train/test must contain matching clinical variables.
You can also create a reproducible development environment or build a reproducible package of Flexynesis with GNU Guix. You will need at least the Guix channels listed in channels.scm
. It also helps to have authorized the Inria substitute server to get binaries for CUDA-enabled packages. See this page for instructions on how to configure fetching binary substitutes from the build servers.
You can build a Guix package from the current committed state of your git checkout and using the specified state of Guix like this:
guix time-machine -C channels.scm -- \
build --no-grafts -f guix.scm
To enter an environment containing just Flexynesis:
guix time-machine -C channels.scm -- \
shell --no-grafts -f guix.scm
To enter a development environment to hack on Flexynesis:
guix time-machine -C channels.scm -- \
shell --no-grafts -Df guix.scm
Do this to build a Docker image containing this package together with a matching Python installation:
guix time-machine -C channels.scm -- \
pack -C none \
-e '(load "guix.scm")' \
-f docker \
-S /bin=bin -S /lib=lib -S /share=share \
glibc-locales coreutils bash python
For interactively using flexynesis on Jupyter notebooks, one can define the kernel to make flexynesis and its dependencies available on the jupyter session.
Assuming you have already defined an environment and installed the package:
conda activate flexynesis
python -m ipykernel install --user --name "flexynesis" --display-name "flexynesis"
To export existing spec-file.txt:
conda list --explicit > spec-file.txt
Run unit tests
pytest -vvv tests/unit
This will run all the unit tests in the tests directory.
If you would like to contribute to the project, please open an issue or a pull request on the GitHub repository.
When working on a feature on a new branch, don't forget to write a branch description:
git branch --edit-description
You can view branch descriptions:
git config branch.<branch name>.description
pdoc --html --output-dir docs --force flexynesis