There are a lot of published papers with a lot of data and mining data can make new discoveries.
Big data in basic and translational cancer research
- Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical descriptors derived from a gigantic knowledge graph, displaying more than 450 thousand biological entities and 30 million relationships between them. The Bioteque integrates, harmonizes, and formats data collected from over 150 data sources, including 12 biological entities (e.g., genes, diseases, drugs) linked by 67 types of associations (e.g., ‘drug treats disease’, ‘gene interacts with gene’)
- CPTAC python package The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. CPTAC generates comprehensive proteomics and genomics data from clinical cohorts, typically with ~100 samples per tumor type. The graphic below summarizes the structure of each CPTAC dataset.
- Pan-cancer proteogenomics characterization of tumor immunity
- Pan-cancer proteomic map of 949 human cell lines download https://cellmodelpassports.sanger.ac.uk/
- Quantitative Proteomics of the Cancer Cell Line Encyclopedia
- Pan-cancer proteogenomics expands the landscape of therapeutic targets - PubMed https://pubmed.ncbi.nlm.nih.gov/38917788/
- Atlas of the plasma proteome in health and disease in 53,026 adults
recount methylation: https://bioconductor.org/packages/release/bioc/vignettes/recountmethylation/inst/doc/recountmethylation_users_guide.html
- A deep profile of gene expression across 18 human cancers 50K cancer samples from GEO. https://github.com/suinleelab/deepprofile-study (code), and https://doi.org/10.6084/m9.figshare.25414765.v2 (data).
- Deep profiling of gene expression across 18 human cancers Here we describe an unsupervised deep-learning framework for the generation of low-dimensional latent spaces for gene-expression data from 50,211 transcriptomes across 18 human cancers.
- Quantitative Proteomics of the Cancer Cell Line Encyclopedia
- Pan-cancer single cell RNA-seq uncovers recurring programs of cellular heterogeneity
- Next-generation characterization of the Cancer Cell Line Encyclopedia Broad has DepMap
- Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity scRNAseq for CCLE lines!
- A metastasis map of human cancer cell lines
depmap bioc package https://depmap.org/portal/download/custom/ https://depmap.org/portal/interactive/
- The chromatin accessibility landscape of primary human cancers
- Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer
- Pan-cancer analysis of whole genomes
- Removing unwanted variation from large-scale RNA sequencing data with PRPS normalize TCGA data
- Evolutionary signatures of human cancers revealed via genomic analysis of over 35,000 patients
- A pan-tissue survey of mosaic chromosomal alterations in 948 individuals
UCSC xena https://xena.ucsc.edu/
-
200,000 whole genomes made available for biomedical studies by U.K. effort We present mutational signature analyses of 12,222 whole-genome–sequenced cancers collected prospectively via the UK National Health Service (NHS) for the 100,000 Genomes Project
-
Substitution mutational signatures in whole-genome–sequenced cancers in the UK population
-
Genomic data in the All of Us Research Program Here we describe the programme’s genomics data release of 245,388 clinical-grade genome sequences.
- Comprehensive characterization of 536 patient-derived xenograft models prioritizes candidatesfor targeted treatment https://portal.pdxnetwork.org/
-
High-resolution genetic mapping of putative causal interactions between regions of open chromatin
-
Global reference mapping and dynamics of human transcription factor footprints We map the fine-scale structure of ~1.6 million DHS
-
Index and biological spectrum of accessible DNA elements in the human genome Download the data from https://www.meuleman.org/project/dhsindex/ Wouter Meuleman is the author of epilogos.
-
Integrative analysis of 10,000 epigenomic maps across 800 samples for regulatory genomics and disease dissection by Manolis Kellis group: To help elucidate genetic variants underlying complex traits, we develop EpiMap, a compendium of 833 reference epigenomes across 18 uniformly-processed and computationally-completed assays. We define chromatin states, high-resolution enhancers, activity patterns, enhancer modules, upstream regulators, and downstream target gene functions.
-
Boix et. al bioRxiv (2019) - Integrative analysis of 10,000 epigenomic maps across 800 samples for regulatory genomics and disease dissection http://compbio.mit.edu/epimap/
- Single-Cell Transcriptome Atlas of Murine Endothelial Cells
- A cell atlas of human thymic development defines T cell repertoire formation
- Chromatin accessibility dynamics in a model of human forebrain development Trevino et al. avoided this problem by developing three-dimensional organoid models of human forebrain development and examining chromatin accessibility and gene expression at the single-cell level.
- Single-cell transcriptome profiling an adult human cell atlas of 15 major organs
"we performed single-cell transcriptomes of 88,622 cells derived from 15 tissue organs of one adult donor and generated an adult human cell atlas (AHCA). The AHCA depicted 234 subtypes of cells, including major cell types such as T, B, myeloid, epithelial, and stromal cells, as well as novel cell types in skin, each of which was distinguished by multiple marker genes and transcriptional profiles and collectively contributed to the heterogeneity of major human organs. Moreover, TCR and BCR repertoire comparison and trajectory analyses revealed direct clonal sharing of T and B cells with various developmental states among different tissues. Furthermore, novel cell markers, transcription factors and ligand-receptor pairs were identified with potential functional regulations on maintaining the homeostasis of human cells among tissues."
- Construction of a human cell landscape at single-cell level database url http://bis.zju.edu.cn/HCL/
- A human cell atlas of fetal chromatin accessibility
- A comprehensive library of human transcription factors for cell fate engineering
- A single–cell type transcriptomics map of human tissues
- Signatures of plasticity, metastasis, and immunosuppression in an atlas of human small cell lung cancer
- Integrated single-cell transcriptomics and epigenomics reveals strong germinal center–associated etiology of autoimmune risk loci
- A single-cell atlas of chromatin accessibility in the human genome from Bing Ren group.
- The Tabula Sapiens: a multiple organ single cell transcriptomic atlas of humans We used single cell transcriptomics to create a molecularly defined phenotypic reference of human cell types which spans 24 human tissues and organs." The authors also systematically analyzed splicing differences at the single-cell resolution. For interactive visulization: https://tabula-sapiens-portal.ds.czbiohub.org/splicing
- Integrative mapping of human CD8+ T cells in inflammation and cancer | Nature Methods https://www.nature.com/articles/s41592-024-02530-0 scAtlasVAE it has both RNA and TCR data for the CD8T!
- A blueprint for tumor-infiltrating B cells across human cancers tumor-infiltrating B cells across 21 different cancer types from more than 270 patients
- Spatial transcriptomics of B cell and T cell receptors reveals lymphocyte clonal dynamics
- The target atlas for antibody-drug conjugates across solid cancers
- https://www.ejcancer.com/article/S0959-8049(23)00681-0/fulltext
-
The LINCS L1000 project has collected gene expression profiles for thousands of perturbagens at a variety of time points, doses, and cell lines. A full list of the chemical and genetic perturbations used can be found on the CLUE website along with their descriptions. https://lincsproject.org/LINCS/tools/workflows/find-the-best-place-to-obtain-the-lincs-l1000-data
-
Single Cell Perturbation Datasets https://projects.sanderlab.org/scperturb/ from Chris Sander group.
-
cellpainting gallary from Broad https://registry.opendata.aws/cellpainting-gallery/
-
Bioimagingguide.org, a companion website to paper “A biologist’s guide to planning and performing quantitative bioimaging experiments”
-
Tutorials.cellprofiler.org – a list of written tutorials, many with accompanying videos
-
Data-analysis strategies for image-based cell profiling – the gold-standard methods paper for performing image-based profiling
-
Interpreting Image-based Profiles using Similarity Clustering and Single-Cell Visualization – a protocol paper on how to interpret image-based profiling experiments once you’ve performed them