🎯 This is the repository to host code for the CRC scATAC-seq project.
For more information about the project, please check our publication on Cancer Discovery.
🧬 Raw sequencing data
All sequencing data of scATAC-seq generated in this study have been deposited in the Genome Sequence Archive for Human (GSA-Human) under accession number HRA000992.
📑 Processed data
Processed fragments files of scATAC-seq have been deposited in the Open Archive for Miscellaneous Data (OMIX) under accession number OMIX005759.
📝 Metadata
Metadata for each patient and single cell are available in the ./metadata
of this repository.
Downstream analysis on the scATAC-seq data of CRCs.
00.Requirements.R
Requisites script, import librarys and functions.
01.All_Atlas.R
Basic analysis of the scATAC-seq atlas, related to Figure 1.
- Dimensional reductions, clustering and cell typing
- Single-cell CNV analysis
- Marker peaks & TFs for each cell type
02.Epi_AD_Methylation.R
Chromatin dynamics of early adenomas, related to Figure 2.
- Differential peaks & TFs in adenomas
- Compare adenoma peaks with CRCs
- Association with DNA methylation
03.Epi_Molecular_Subtype.R
Unsupervised subtyping of CRCs & chromatin features of iCMS subtypes, related to Figure 3 and Figure 4.
- NMF of all malignant clusters
- Differential analysis of each iCMS
- Identify iCMS-specific TFs
- Detailed analysis of TF activity and downstream targets.
04.Epi_Intratumor.R
Analysis of intra-tumor heterogeneities, related to Figure 5.
- Identify CNV-based intra-tumor subclones
- Phylogenetic analysis of subclones
- Differential analysis of each subclones
05.Epi_CIMP.R
Analysis of CIMP classifications, related to Figure 6.
- Identify CIMP subtypes
- Differential analysis of each iCMS
- Identify CIMP-High specific TFs
06.Epi_TF_Module.R
Weighted correlation network of TF activities, related to Figure 7.
- Construct correlation network on TF activities
- Identify subtype-related TF modules
- Association between TF module and gene expression
Pipelines for processsing scATAC-seq data.
01.scATAC.process.one.sh
Process raw sequencing reads from scATAC-seq data.
02.Create_Arrow.sh
Create arrow files as input for ArchR.
CNV_from_Arrow.R
Single-cell CNV analysis. Modifed from https://github.com/GreenleafLab/10x-scATAC-2019.
Run.Homer.Motif.sh
Perform motif enrichment in given peak set using Homer.
Run.MEDICC2.sh
Perform phynogeneic analysis of tumor subclones using MEDICC2.
Re-analysis and integration of public datasets, including DNA methylation, scRNA-seq, and scATAC-seq.
Process_Methylation_Beadchip.R
Process DNA methylation array data of CRCs generated by Luo et al.
Data source: GSE48684
scATAC_NG_CRC_continuum.R
Re-analysis of scATAC-seq data of CRC continuum generated by Becker et al.
Data source: GSE201349
scRNA_10X_CRC_atlas.R
Re-analysis of scRNA-seq data of CRCs generated by Lee et al.
Data source: GSE132465
- R environment:
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22631)
- R packages:
attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] ChIPseeker_1.30.3 minfi_1.40.0 bumphunter_1.36.0
[4] locfit_1.5-9.7 iterators_1.0.14 foreach_1.5.2
[7] edgeR_3.36.0 limma_3.50.3 readr_2.1.4
[10] igraph_1.4.2 WGCNA_1.72-1 fastcluster_1.2.3
[13] dynamicTreeCut_1.63-1 ggpubr_0.6.0 clusterProfiler_4.2.2
[16] NMF_0.26 cluster_2.1.4 rngtools_1.5.2
[19] registry_0.5-1 LOLA_1.19.1 ggbeeswarm_0.7.2
[22] Vennerable_3.1.0.9000 viridis_0.6.3 viridisLite_0.4.2
[25] pheatmap_1.0.12 patchwork_1.1.2 org.Hs.eg.db_3.14.0
[28] genomation_1.26.0 dplyr_1.1.2 corrplot_0.92
[31] UpSetR_1.4.0 TxDb.Hsapiens.UCSC.hg38.knownGene_3.14.0 GenomicFeatures_1.46.5
[34] AnnotationDbi_1.56.2 SeuratObject_4.1.3 Seurat_4.3.0
[37] RColorBrewer_1.1-3 BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.62.0
[40] rtracklayer_1.54.0 Biostrings_2.62.0 XVector_0.34.0
[43] rhdf5_2.38.1 SummarizedExperiment_1.24.0 Biobase_2.54.0
[46] MatrixGenerics_1.6.0 Rcpp_1.0.10 Matrix_1.5-4
[49] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0
[52] S4Vectors_0.32.4 BiocGenerics_0.40.0 matrixStats_0.63.0
[55] data.table_1.14.8 stringr_1.5.0 plyr_1.8.8
[58] magrittr_2.0.3 ggplot2_3.4.2 gtable_0.3.3
[61] gtools_3.9.4 gridExtra_2.3 ArchR_1.0.2
Please consider citing our paper:
Liu, Z., Hu, Y., Xie, H., Chen, K., Wen, L., Fu, W., Zhou, X., & Tang, F. (2024). Single-Cell Chromatin Accessibility Analysis Reveals the Epigenetic Basis and Signature Transcription Factors for the Molecular Subtypes of Colorectal Cancers. Cancer Discovery, 14(6), 1082–1105. https://doi.org/10.1158/2159-8290.CD-23-1445
For any comments or questions, please feel free to submit a GitHub issue or contact me via email at [email protected] ✨.