Skip to content

Latest commit

 

History

History
96 lines (71 loc) · 5.33 KB

README.md

File metadata and controls

96 lines (71 loc) · 5.33 KB

Non-negative matrix factorization and deconvolution as dual simplex problem

This repository is an official starting point to explore Dual Simplex NMF/deconvolution method It contains code to reproduce figures from the paper, and at the same time, provides examples on how to use the DualSimplex package.

Non-negative matrix factorization and deconvolution as dual simplex problem
Denis Kleverov, Ekaterina Aladyeva, Alexey Serdyukov, Maxim Artyomov
bioRxiv 2024.04.09.588652; doi: https://doi.org/10.1101/2024.04.09.588652

Project structure

- data — all the external data, used in figures
- figures — notebooks for figures reproduction
- out — generated svgs and dualsimplex checkpoints will be placed here
- R — supporting code, imported in figures

Running

  1. Select a figure to reproduce.
  2. Script setup.R (executed at the beginnig of the each script) will install the DualSimplex package using the github
  3. If you chose Figure 6 or Figure 7, download and unpack the contents of large.tar.gz into data/large.
  4. Go to the figures directory and open the corresponding notebook.
  5. Run cells in the notebook one by one. Optionally, tweak some parameters to see alternative outcomes.
  6. See resulting figures in the out directory.

Figures in this repository

2. Sinkhorn procedure

Simple visualization of the Sinkhorn procedure applied to factorizable matrix (2_sinkhorn_visualization.Rmd)

3. Main algorithm

Deconvolution of simulated bulk RNA-seq gene expression dataset with main approach (3c_simulated_gene_expression_main_algorithm.Rmd)

4. Minimal formulation

Deconvolution of simulated bulk RNA-seq gene expression dataset with alternative aproach (4d_simulated_gene_expression_alternative_approach.Rmd)

5. Picture unmixing with NMF

6. Single cell data

7. Complete deconvolution of bulk RNA-seq data

S3. NMF with simulated data matrices

S4. Different number of clusters for single cell data

Comparison of the clustering solutions for different methods (s4_single_cell_hnsc_different_k.Rmd)

S5. Further analysis for TCGA HNSC bulk RNA-seq dataset

Pathway analysis, signature genes expression heatmap, multple initializations (s5_hnsc_further.Rmd)

S6. Signature base deconvolution with DualSimpelx approach

Authors

Contributors names and contact info

Troubleshooting

Dependency: package 'xxx' is not available (for R version x.y.z)

Install package directly from source link from CRAN. For example:

install.packages(https://cran.r-project.org/src/contrib/RcppML_0.3.7.tar.gz, repos = NULL)

Can't plot UMAP with plot_projected on Mac

Unfortunately, umap library has a bug (only on MacOS) that doesn't allow to add new points to umap after it's calculated, which is crucial for DualSimplex. If that is the case for you, call plot_projected(use_dims = 2:3), or other dimensions, to see simplexes without dimensionality reduction.