This repository provides a community-maintained summary of models and datasets. It was initially curated for (Cell Systems, 2021).
There are various resources for evaluation of single cell perturbation models. We discuss five tasks in the publication which can be supported by the following publicly available annotations:
- GDSC provides a collection of cell viability measurements for many compounds and cell lines. We provide a code snippet to conveniently load GDSC-provided z-score compound response rankings per cell line.
- Additional viability data can be obtained from DepMap's PRISM dataset.
- Therapeutics Data Commons provides access to a number of compound databases as part of their cheminformatics tasks. (In the same vein, OpenProblems provides a framework for tasks in single-cell which can also support perturbation modeling tasks in a more long term format than was previously seen in the DREAM challenges.)
- PubChem contains a comprehensive record of compounds ranging from experimental entities to non-proprietary small molecules. It is queryable via PubChemPy.
- DrugBank provides annotations for a relatively small number of small molecules in a standardized format.
We maintain a list of perturbation-related tools at scrna-tools. Please consider further updating and tagging tools there.
For the basis of the table in the article, see this spreadsheet of a subset of perturbation models which includes more details.
Below, we curated a table of perturbation datasets based on Svensson et al. (2020).
We also offer some datasets in a curated .h5ad
format. These datasets have the following standardized fields in .obs
:
perturbation_name
-- Human-readable ompound names (International non-proprietary naming where possible) for small molecules and gene names for genetic perturbations.perturbation_type
--small molecule
orgenetic
perturbation_value
-- A continuous covariate quantity, such as the dosage concentration or the number of hours since treatment.perturbation_unit
-- Describesperturbation_value
, such as'ug'
or'hrs'
.
Shorthand | Title | .h5ad availability | Treatment | # perturbations | # cell types | # doses | # timepoints | Reported cells total | Organism | Tissue | Technique | Data location | Panel size | Measurement | Cell source | Disease | Contrasts | Developmental stage | Number of reported cell types or clusters | Cell clustering | Pseudotime | RNA Velocity | PCA | tSNE | H5AD location | Isolation | BC --> Cell ID OR BC --> Cluster ID | Number individuals |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Jaitin et al. Science | Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types | genetic targets | 8-22 | 1 | - | 1 | 4,468 | Mouse | Spleen | MARS-seq | GSE54006 | nan | RNA-seq | CD11c+ enriched splenocytes | nan | nan | nan | 9 | Yes | No | nan | No | No | nan | Sorting (FACS) | nan | nan | |
Adamson et al. Cell | A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response | genetic targets | 9-93 (sgRNA) | 1 | - | 1 | 86,000 | Human | Culture | Perturb-seq | GSE90546 | nan | RNA-seq | K562 | nan | nan | nan | nan | nan | nan | nan | nan | Yes | nan | nan | nan | nan | |
Dixit et al. Cell | Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens | [raw h5ad] [processed h5ad] [procesing nb] | genetic targets | 10,24 | 1 | - | 1-2 | 200,000 | Human, Mouse | Culture | Perturb-seq | GSE90063 | nan | RNA-seq | BMDCs, K562 | nan | nan | nan | nan | nan | nan | nan | nan | No | nan | Nanodroplet dilution | nan | nan |
Datlinger et al. NMeth | Pooled CRISPR screening with single-cell transcriptome readout | genetic targets | 3-29 | 1-2 | - | 1 | 5,905 | Human, Mouse | Culture | CROP-seq | GSE92872 | nan | RNA-seq | HEK293T, 3T3, Jurkat | nan | nan | nan | nan | nan | nan | nan | nan | No | nan | nan | nan | nan | |
Hill et al. NMethods | On the design of CRISPR-based single-cell molecular screens | genetic targets | 32 | 1 | - | 1 | 5,879 | Human | Culture | CROP-seq | GSE108699 | nan | RNA-seq | MCF10a cells | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | https://github.com/shendurelab/single-cell-ko-screens#result-files | nan | |
Gasperini et al. Cell | A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens | genetic targets | 1119, 5779 | 1 | - | 1 | 207,324 | Human | Culture | CROP-seq | GSE120861 | nan | RNA-seq | K562 Cells | nan | CRISPR Screen | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Jost et al. NBT | Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs | genetic targets | 25 | 2 | - | 1 | 19,587 | Human | Culture | Perturb-seq | GSE132080 | nan | RNA-seq | K562 cells | nan | 25 gene screen | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Schraivogel et al. NMethods | Targeted Perturb-seq enables genome-scale genetic screens in single cells | [procesing nb] | genetic targets | 1778 (enhancers) | 1 | - | 1 | 231,667 | Human, Mouse | Bone marrow, Culture | TAP-seq | GSE135497 | 1,000 | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | Yes | nan | nan | nan | nan |
Alda-Catalinas et al. CSystems | A Single-Cell Transcriptomics CRISPR-Activation Screen Identifies Epigenetic Regulators of the Zygotic Genome Activation Program | genetic targets | 230 | nan | nan | nan | 203,894 | Mouse | Culture | Chromium | nan | nan | RNA-seq | mESCs | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Ursu et al. bioRxiv | Massively parallel phenotyping of variant impact in cancer with Perturb-seq reveals a shift in the spectrum of cell states induced by somatic mutations | genetic targets | 200 | 1 | - | 1 | 162,314 | Human | Lung | Perturb-seq | nan | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Jin et al. Science | In vivo Perturb-Seq reveals neuronal and glial abnormalities associated with autism risk genes | genetic targets | 35 | - | - | 1 | 46,770 | Mouse | Brain | Perturb-seq | nan | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Frangieh et al. NGenetics | Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion | [raw h5ad] [processed h5ad] [procesing nb] | genetic targets | 248 | 1 | - | 1 | 218,331 | Human | Culture | Perturb-CITE-seq | SCP1064 | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
Papalexi et al. NGenetics | Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens | genetic targets | nan | nan | nan | nan | 28,295 | Human | Culture | CITE-seq & ECCITE-seq | GSE153056 | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Datlinger et al. NMethods | Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing | genetic targets | nan | nan | nan | nan | nan | Human, Mouse | nan | scifi-RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Leng et al. bioRxiv | CRISPRi screens in human astrocytes elucidate regulators of distinct inflammatory reactive states | genetic targets | 30 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Norman et al. (2019) | nan | [raw h5ad] [processed h5ad] [curation nb] [procesing nb] | genetic targets | 278 | 1 | - | 1 | nan | nan | nan | CRISPRa | nan | nan | RNA-seq | induction of gene pair targets, single gene controls | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
Replogle et al. (2021) | nan | genetic targets | >10000 | 2 | - | - | nan | nan | nan | Perturb-seq | nan | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Shin et al. SAdvances | Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations | small molecules | 45 | 2 | 1 | 1 | 3,091 | Mouse, Human | Culture | Drop-seq | PRJNA493658 | nan | RNA-seq | HEK293T, NIIH3T3, A375, SW480, K562 | nan | 45 perturbations | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Srivatsan et al. Science | Massively multiplex chemical transcriptomics at single-cell resolution | [raw h5ad] [curation nb] [procesing nb] | small molecules | 188 | 3 | 4 | 2 | 650,000 | Human | Culture | sci-Plex | GSE139944 | nan | RNA-seq | Cancer cell lines A549, K562, and MCF7 | nan | 5,000 drug conditions | nan | 3 | Yes | Yes | No | Yes | No | nan | nan | nan | nan |
Zhao et al. bioRxiv | Deconvolution of Cell Type-Specific Drug Responses in Human Tumor Tissue with Single-Cell RNA-seq | small molecules | 2,6 | 6,1 | - | - | 48,404 | Human | Brain, Tumor | SCRB-seq (microwell) | GSE148842 | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 6 | |
McFarland et al. NCommunications | Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action | [curation nb] | small molecules | 1-13 | 24-99 | 1 | 1-5 | nan | Human | Culture | MIX-seq | nan | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
Chen et al. (2020) | nan | small molecules | 300 | 1 | 1 | 1 | nan | nan | nan | CyTOF | nan | nan | protein | breast cancer cells undergoing TGF-β-induced EMT | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |