Skip to content

A deep-learning based multi-modal data integration suite that aims to achieve synesis in a flexible manner

License

Notifications You must be signed in to change notification settings

BIMSBbioinfo/flexynesis

Repository files navigation

logo

Tests benchmarks

flexynesis

A deep-learning based multi-omics bulk sequencing data integration suite with a focus on (pre-)clinical endpoint prediction. The package includes multiple types of deep learning architectures such as simple fully connected networks, supervised variational autoencoders; different options of data layer fusion, and automates feature selection and hyperparameter optimisation. The tools are continuosly benchmarked on publicly available datasets mostly related to the study of cancer. Some of the applications of the methods we develop are drug response modeling in cancer patients or preclinical models (such as cell lines and patient-derived xenografts), cancer subtype prediction, or any other clinically relevant outcome prediction that can be formulated as a regression or classification problem.

workflow

Documentation

A detailed documentation of classes and functions in this repository can be found here.

Benchmarks

For the latest benchmark results see: https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis-benchmark-datasets/dashboard.html

The code for the benchmarking pipeline is at: https://github.com/BIMSBbioinfo/flexynesis-benchmarks

Quick Start

# install 
git clone https://github.com/BIMSBbioinfo/flexynesis.git
cd flexynesis
conda create --name flexynesis --file spec-file.txt
conda activate flexynesis
pip install -e .

# test the installation
curl -L -o dataset1.tgz https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis-benchmark-datasets/dataset1.tgz
tar -xzvf dataset1.tgz

flexynesis --data_path dataset1 --model_class DirectPred --target_variables Erlotinib --fusion_type early --hpo_iter 1 --features_min 50 --features_top_percentile 5 --log_transform False --data_types gex,cnv --outdir . --prefix erlotinib_direct --early_stop_patience 3 --use_loss_weighting False --evaluate_baseline_performance False

Input Dataset Structure

InputFolder/
| --  train 
|    |-- omics1.csv 
|    |-- omics2.csv
|    |--  ... 
|    |-- clin.csv

| --  test 
|    |-- omics1.csv 
|    |-- omics2.csv
|    |--  ... 
|    |-- clin.csv

File contents

clin.csv

clin.csv contains the sample metadata. The first column contains unique sample identifiers. The other columns contain sample-associated clinical variables. NA values are allowed in the clinical variables.

v1,v2
s1,a,b
s2,c,d
s3,e,f

omics.csv

The first column of the feature tables must be unique feature identifiers (e.g. gene names). The column names must be sample identifiers that should overlap with those in the clin.csv. They don't have to be completely identical or in the same order. Samples from the clin.csv that are not represented in the omics table will be dropped.

s1,s2,s3
g1,0,1,2
g2,3,3,5
g3,2,3,4

Concordance between train/test splits

The corresponding omics files in train/test splits must contain overlapping feature names (they don't have to be identical or in the same order). The clin.csv files in train/test must contain matching clinical variables.

Guix

You can also create a reproducible development environment or build a reproducible package of Flexynesis with GNU Guix. You will need at least the Guix channels listed in channels.scm. It also helps to have authorized the Inria substitute server to get binaries for CUDA-enabled packages. See this page for instructions on how to configure fetching binary substitutes from the build servers.

You can build a Guix package from the current committed state of your git checkout and using the specified state of Guix like this:

guix time-machine -C channels.scm -- \
    build --no-grafts -f guix.scm

To enter an environment containing just Flexynesis:

guix time-machine -C channels.scm -- \
    shell --no-grafts -f guix.scm

To enter a development environment to hack on Flexynesis:

guix time-machine -C channels.scm -- \
    shell --no-grafts -Df guix.scm

Do this to build a Docker image containing this package together with a matching Python installation:

guix time-machine -C channels.scm -- \
  pack -C none \
  -e '(load "guix.scm")' \
  -f docker \
  -S /bin=bin -S /lib=lib -S /share=share \
  glibc-locales coreutils bash python

Defining Kernel for Jupyter Notebook

For interactively using flexynesis on Jupyter notebooks, one can define the kernel to make flexynesis and its dependencies available on the jupyter session.

Assuming you have already defined an environment and installed the package:

conda activate flexynesis
python -m ipykernel install --user --name "flexynesis" --display-name "flexynesis"

To export existing spec-file.txt:

conda list --explicit > spec-file.txt

Testing

Run unit tests

pytest -vvv tests/unit

This will run all the unit tests in the tests directory.

Contributing

If you would like to contribute to the project, please open an issue or a pull request on the GitHub repository.

Branches

When working on a feature on a new branch, don't forget to write a branch description:

git branch --edit-description

You can view branch descriptions:

git config branch.<branch name>.description 

Documentation

pdoc --html --output-dir docs --force flexynesis 

About

A deep-learning based multi-modal data integration suite that aims to achieve synesis in a flexible manner

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published