Analysis for the creation of AML scAtlas.
Explore the Data »
View Code
·
Read Preprint
Table of Contents
Using large scale data integration of publicly available single-cell data, we created a single-cell transcriptomic atlas of acute myeloid leukaemia (AML). The data is hosted for easy gene expression exploration, and is downloadable as an AnnData object.
All code is included in this repository, divided into the main analysis steps:
QC was completed for each scRNA-seq sample individually and on the combined dataset, using a standardised workflow.
- Sample QC - Example notebook shows preprocessing steps performed on one sample
- Batch Assessment - Batch effects were quantified within and between studies
- Integrated QC - Combining all data and QC steps
- Uncorrected Analysis - Dimensionality reduction on dataset without batch correction
To handle batch effects, we performed benchmarking of 3 batch correction methods scalable to single-cell atlas integration tasks - Harmony, scVI, and scANVI.
- all-data - batch correction methods implemented using all genes
- hvg2000 - batch correction methods implemented using 2000 highly variable genes. The same process was also performed using 4000, 6000, 8000 and 10000 highly variable genes as part of the benchmarking.
- Results - scripts/notebooks used to combine and visualise the benchmarking results.
Scripts used to carry out the main analysis steps on the complete AML scAtlas, split into relevant sub-folders.
- Batch Correction
- Dimensionality reduction
- Clustering
- Annotation
- Code used to identify leukemic stem cell (LSC) populations within AML scAtlas.
- Analysis performed on the t(8;21) data of the AML scAtlas to identify age-associated GRN in t(8;21) AML, and subsequent validation with the TARGET/BeatAML cohorts, and Lambo et al data
- Jessica Whittle (author) - [email protected]
- Mudassar Iqbal (corresponding author) - [email protected]
- Georges Lacaud (corresponding author) - [email protected]
- Syed Murtuza Baker (corresponding author) - [email protected]