All publication material relevant for the manuscript describing the flexynesis software package
Accessible from Hulk/Beast/Max: /fast/AG_Akalin/buyar/flexynesis_manuscript_work/
The ./raw
folder contains the original dataset downloaded from a source such as Cbioportal/TCGA/PharmacoGx/DepMAP.
The ./prepared
folder contains data prepared as input to flexynesis.
Below is a description of the datasets used in the manuscript and how to prepare them for analysis with flexynesis
Go to /fast/AG_Akalin/buyar/flexynesis_manuscript_work/datasets
:
The ./raw
folder contains:
- CCLE.rds: downloaded from Zenodo.
- GDSC2.rds: downloaded from Zenodo.
- lgggbm_tcga_pub.tar.gz: downloaded from cbioportal.
- brca_metabric.tar.gz: downloaded from cbioportal.
- depmap: downloaded from depmap portal.
- nbl_target_2018_pub.tar.gz: downloaded from cbioportal.
- GDCData: TCGA cohort datasets for 33 cancer types downloaded using the TCGABiolinks package (See GitHub).
- prot-trans: protein sequence embeddings obtained from prot-trans-xl-uniref50 model on uniprot sequences.
- describeProt: protein level sequence/structure/function features from describeprot database (Download here).
The ./prepared
folder contains:
- ccle_vs_gdsc: Drug response data from cell lines from CCLE and GDSC2 datasets. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.gdsc_vs_ccle.R raw/
- lgggbm_tcga_pub_processed: Merged cohorts of LGG + GBM samples. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.LGG_GBM.R ./src/get_cbioportal_data.R
- brca_metabric_processed: METABRIC dataset processed.
/opt/R/4.2/bin/Rscript ./src/prepare_data.metabric.R ./src/get_cbioportal_data.R
- single_cell_bonemarrow: CITE-Seq dataset from Seurat. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.cite_seq.R
- neuroblastoma_target_vs_depmap: neuroblastoma patient samples (TARGET study) and cell lines (depmap). Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.neuroblastoma_finetuning.R ./src/get_cbioportal_data.R ./raw/depmap/ ./src/utils.R
- tcga_cancertype: TCGA cancer cohort for ~21 cancer types 100 samples per each cohort. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.tcga_cancertype.R ./src/utils.R ./raw/GDCdata
- depmap_gene_dependency: Dataset for gene-dependency prediction in cell lines. Consists of depmap gene expression + prottrans embeddings + describeprot features. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.depmap.R ./src/utils.R ./raw/depmap/ ./raw/prot-trans/embeddings.protein_level.csv ./raw/uniprot2hgnc.RDS ./raw/describePROT/9606_value.csv
How to reproduce figures:
Go to /fast/AG_Akalin/buyar/flexynesis_manuscript_work/analyses
:
Activate guix environment: .. code-block:: bash
source ../flexynesis_manuscript/manuscript/etc/profile
Rscript ../flexynesis_manuscript/src/figures_single_task.R ../flexynesis_manuscript/src/utils.R ./output2
Rscript ../flexynesis_manuscript/src/figures_multitask.R ../flexynesis_manuscript/src/utils.R ./output2
Rscript ../flexynesis_manuscript/src/figures_tcga_unsupervised.R ../flexynesis_manuscript/src/utils.R ./unsupervised_cancertype/
Rscript ../flexynesis_manuscript/src/figures_depmap.R ../datasets/prepared/depmap_gene_dependency/ depmap_analysis/output/
Rscript ../flexynesis_manuscript/src/figures_finetuning.R ../flexynesis_manuscript/src/utils.R finetuning/
Rscript ../flexynesis_manuscript/src/figures_marker_analysis.R ../flexynesis_manuscript/src/utils.R marker_analysis/output/
Figure 8: benchmark summary
Rscript ../flexynesis_manuscript/src/figures_benchmarks.R benchmarks/output2
git clone https://github.com/BIMSBbioinfo/flexynesis.git
cd flexynesis
conda create -n flexynesis --file spec-file.txt
conda activate flexynesis
pip install -e .
guix package --manifest=guix.scm --profile=./manuscript
source ./manuscript/etc/profile
conda activate flexynesis
Flexynesis documentation is built and served on bimsbstatic.
- Navigate to /data/bimsbstatic/public/akalin/buyar/flexynesis
- Run mkdocs build => this generates a website in ./site
- The documentation is served at https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis/site/