Seurat-compatible tools for the analysis of cancer scRNA-seq data.
Install with:
devtools::install_github("andrei-stoica26/LISTO")
The package provides the following files:
-
Preprocessing:
- preprocessing.R: Generates counts matrix and unspliced counts matrix
-
Initialization:
- start.R: Creates a Seurat object from a list of files containing count matrices.
-
Quality control:
- doublets.R: Creates an aggregate doublet prediction based on multiple (default = 100) scDblFinder runs. Contains functions for providing visual summaries of the doublet predictions.
- quality_control.R. Includes functions for adding criteria for the filtering of genes and cells: Shannon diversity, Simpson diversity, novelty score, percentage of ribosomal genes per cell, percentage of mitochondrial genes per cell. Also included are functions for identifying and removing rare genes.
-
Visualization:
- display.R: Includes a variety of customized visualization functions, such as grob plots, histograms, MDS plots and alluvial plots.
-
Individual SCTransform:
- individual_transform.R: Applies SCTransform to Seurat subsets corresponding to the experimental conditions present in the data.
-
Functional characterization of the clusters:
- enrichr_analysis.R: Converts Enrichr output to enrichment result type of object.
- epigenetic_overlap.R: Studies cluster variation in terms of activated epigenetic pathways.
- [functional_characterisation][R/functional_characterisation.R]: Functional characterization of the clusters based on enrichment analysis. Includes a function for computing pathway overlaps between clusters, with a Benjamini-Yekutieli (BY) adjustment for multiple comparisons.
- pathway_analysis.R: Customized enrichment analysis and visualization using DOSE and enrichplot.
-
Overlap analysis:
- ccoverlap.R: Computes the p-values of overlaps between the markers of clusters and the markers of experimental conditions, using the BY adjustment for multiple comparisons.
- gene_overlap_extra.R: BY-corrected Jaccard index calculations based on ranked and unranked marker lists.
- gene_overlap_main.R: Computes BY-corrected p-values of overlaps between sets of genes, both ranked and unranked.
- gene_overlap_tools.R: Provides the tools needed for computing BY-corrected p-values of overlaps of two or three sets of genes, defining also the BY correction function. Function employing gene sets ranked based on either the p-value or the average log2 fold-change are provided.
- printing_tools.R: Facilitates the printing of gene lists and gene/cell overlap matrices.
- two_universes.R: Overlap assessments for two scRNA-seq datasets.
-
Pseudobulk analysis:
- pseudobulk.R: Includes tools for assessing the bulk-level and cluster-level representation of gene signatures, and evaluating the presence of genes of interest among cluster and condition selection markers.
-
Intracluster analysis:
- intracluster.R: Studies the distribution of cells from each treatment condition in clusters, and computes BY-corrected p-values of overlaps between gene sets and condition markers within clusters.
-
Literature markers input:
- literature_markers.R: Adds gene sets selected from the literature.
- synonyms.R:Finds synonyms for genes of interest.
-
Generation of marker lists:
- marker_lists_main.R: Finds markers of treatment condition selections, of clusters, and of condition selections within clusters. Bonferroni corrections for multiple testing are implemented.
- marker_lists_tools.R: Provides the tools needed for finding markers, such as inputting condition selections.
-
Other quantitative assessments:
- cluster_statistics.R: Finds exclusive cluster markers, the average cluster expression for a gene of interest and clusters of maximum average expression for chosen genes.
- correlation_analysis.R: Gets correlations between the expressions of genes.
-
Trajectory building:
- monocle_analysis.R: Generates a trajectory using Monocle 3.
- slingshot_analysis.R: Trajectory analysis using slingshot. Includes a function for calculating an aggregate pseudotime from multiple lineages.
-
Stemness-as-gradient analysis:
- activity_parallel.R: Parallelizes the calculation of ORIGINS activity scores.
- activity_tools.R: Includes visualization, cluster comparison, gene identification and intracluster effect size calculation tools for ORIGINS activity.
- stemness_classifiers.R: Includes functions for detecting genes most closely linked to the variation in ORIGINS activity scores, slingshot aggregate pseudotime and Monocle 3 pseudotime and for calculating the overlaps of these gene sets, as well as visualization tools. Also included are functions ranking groupings selected from Seurat metadata based on each of the three metrics using Wilcoxon rank sum tests.
- tradeseq_analysis.R: In-depth trajectory analysis using tradeSeq.
-
Cell-cell interaction analysis:
- cell_interactions.R: CellChat analysis. Note: CellChat has now been updated to CellChat v2: LISTO does not provide functionalities involving the new version of CellChat.
- sg_cell_interactions.R: SingleCellSignalR analysis.
-
Aggregate analysis of two Seurat datasets:
- aggregate.R: Resolve gene name conflicts in Seurat objects and marker lists.
-
Miscellaneous tools:
- extend_seurat.R: Adds new columns to Seurat metadata e.g. cell type, expression for a gene of interest.
- fit_gam.R: Fits a generalized additive model (GAM) to the gene expression data using the gam package.
- gene_analysis.R: Including functions operating on the cells expressing genes of interest.
- number_and_percentage.R: Creates bar plots based on columns in the Seurat metadata.
- pair_analysis.R: Analyzes the differences in marker expression between two clusters.
- reactome_analysis.R: ReactomeGSA analysis.
-
Utils:
- all_imports.R: Imports all the required packages.
- examples.R: Commented code showcasing usage examples for selected functions in the package.
Most of the LISTO functions were used in generating the results from my PhD thesis.