Gregory Way 2018
This module downloads and processes several gene sets and integrates these gene sets into a heterogeneous network (hetnet)(Himmelstein et al. 2017).
This network will be projected onto compressed gene expression features to enable biological interpretation.
The module stores data and data processing scripts of MSigDB and xCell gene sets.
Individual MSigDB gene sets (version 6.1) were downloaded from GSEA downloads.
We also download the full gene set: msigdb.v6.1.entrez.gmt
.
See download_msigdb.ipynb for specific details.
The genesets consist of 8 different collections; many also have sub-collections:
Name | Collection | License |
---|---|---|
H | Hallmark gene sets | CC-BY 4.0 |
C1 | Positional gene sets | CC-BY 4.0 |
C2 | Curated gene sets | CC-BY 4.0 (except KEGG, BioCarta, AAAS/STKE) |
C2.CPG | Chemical and genetic perturbations | CC-BY 4.0 |
C2.CP.Reactome | Reactome | CC-BY 4.0 |
C3 | Motif gene sets | CC-BY 4.0 |
C3.MIR | microRNA targets | CC-BY 4.0 |
C3.TFT | Transcription factor targets | CC-BY 4.0 |
C4 | Computational gene sets | CC-BY 4.0 |
C4.CGN | Cancer gene neighborhoods | CC-BY 4.0 |
C4.CM | Cancer modules | CC-BY 4.0 |
C5 | Gene Ontology (GO) terms | CC-BY 4.0 |
C5.BP | GO biological processes | CC-BY 4.0 |
C5.CC | GO cellular components | CC-BY 4.0 |
C5.MF | GO molecular functions | CC-BY 4.0 |
C6 | Oncogenic gene sets | CC-BY 4.0 |
C7 | Immunologic gene sets | CC-BY 4.0 |
We do not include the KEGG, BioCarta, AAAS/STKE gene sets in C2
.
For full license terms visit the MSigDB license page.
We download and process the 489 gene signatures from Arun et al. 2017. These 489 signatures represent 64 different human cell types including CD8+ T Cells, Neutrophils, Macrophages, etc.
See process_xCell.ipynb for specific details.
The gene sets are uniformly processed and linked together in a single hetnet. The hetnet can be projected onto a compressed gene expression matrix to rapidly assign biological significance to compressed gene expression features.