-
Notifications
You must be signed in to change notification settings - Fork 3
Home
SziKayLeung edited this page Feb 6, 2023
·
2 revisions
This is a wiki detailing scripts and pipelines used for the "Full-length transcript sequencing of human and mouse identifies widespread isoform diversity and alternative splicing in the cerebral cortex" by SK.Leung, A.R.Jeffries,..E.Hannon,J.Mill.
![](https://github.com/SziKayLeung/Whole_Transcriptome_Paper/raw/main/Utilities/Images/bioinformatics_pipeline.png)
- Run Isoseq3.1.2 pipeline by merging all samples (human cortex, mouse cortex) post-CCS, pre-LIMA. This involves running the working script in human and mouse, by sourcing the relevant functions script (human, mouse). Both functions scripts include steps from Iso-Seq3 (CCS, LIMA, REFINE, CLUSTER) to RNA-Seq STAR Align for SQANTI2 input as junctions file and Kallisto for SQANTI2 input as expression file, to SQANTI2 and filtering.
- Note CCS was generated separately for each sample before merging
- Subsampling of human and mouse cortex using E.Tseng's scripts for rarefaction curves
- Extracting lengths from CCS (circular consensus sequence) reads using E.Tseng's scripts
- Summarising the number of CCS reads successfully generated across all Iso-Seq samples
- RNA-Seq data from mouse (n = 12 biologically independent samples) and human fetal (n = 3 biologically independent samples) was aligned to mouse cortical and human fetal cortical Iso-seq transcriptome respectively using STAR and Kallisto
- Determining gene and transcript expression of short-read RNA-Seq from Kallisto.
Note: Novel genes are defined as transcripts mapped to regions not previously annotated as genes in existing genomic databases (human: hg38, mouse: mm10), and are classified as either "antisense" or "intergenic" by SQANTI2
- Novel Genes Human vs Mouse: Blast novel genes detected from human cortical Iso-Seq transcriptome to novel genes detected from mouse cortical Iso-Seq transcriptome for homology, with identification of one common novel gene (TMEM107-VAMP2)
- Novel Genes Across Genome: Blast novel genes from human cortical Iso-Seq transcriptome to human genome (hg38), and novel genes from mouse cortical Iso-Seq transcriptome to mouse genome (mm10).
- Antisense Novel Genes: Finding overlap of antisense novel genes with exonic regions of other genes
![](https://github.com/SziKayLeung/Whole_Transcriptome_Paper/raw/main/Utilities/Images/AS_Events.png)
- Mutually Exclusive (MX) and Skipped Exons (SE) events were identified from SUPPA2
- Of note, output files from SUPPA2 only details AS events associated with isoforms, named with PB_IDs, and not the genes.
- Intron Retention (IR) events were identified from SQANTI2 (automatially generated and identified from the output classification.txt under column "subcategory")
- Other AS events (A3', A5', Alternative First (AF), AL (Alternative Last)) were identified from custom scripts using gtf-coordinates
- Assessment of RNA isoform diversity in genes robustly associated with
- Alzheimer's disease: APP,PSEN1,PSEN2 and 59 genes nominated from the most recent GWAS meta-analysis
- Autism: 393 genes nominated as being category 1 (high confidence) and category 2 (strong candidate) from the SFARI Gene database, from this list
- Schizophrenia: 339 genes nominated from the most recent GWAS meta-analysis, from these two lists: SZ_CLOZUK_GENE and SZ_PGC2
- Number of transcripts, IR-transcripts, fusion transcripts associated with disease were tabulated
- All figures were generated from script
- Majority of the figures and tables generated are from SQANTI2 filtered data (as listed here, with datawrangling
- Tables include post-analysis with BLAST of novel genes, determining cut-off threshold for high gene expression, finding fusion genes sharing exons with more than 2 genes, and overlap of fusion genes with ConjoinG
Rmarkdowns were generated from SQANTI2 classification files of human cortex (adult, fetal) and mouse cortex (4 Iso-Seq datasets) for summary information:
- Descriptive summary: Detailing all the descriptive numbers and lengths of genes and transcripts in each Iso-Seq dataset
- Summary is provided for both SQANTI2 filtered and unfiltered data
- Human and Mouse Comparison: Tabulate the number of isoforms per gene across human and mouse cortex, and human adult and human fetal cortex
- Homology is accounted by converting mouse genes to the equivalent human genes using mouse genome informatics syntenic gene list
- Fusion genes: Number of fusion genes in each Iso-Seq dataset, and common genes, and those associated with disease
- Intron retention: Number of genes and transcripts with intron retention in each Iso-Seq dataset with comparisons across genes, and those associated with disease
- Novel Genes: Number of Novel genes, and proportion with RNA-Seq support, within 50bp of a CAGE peak, and associated with disease