Skip to content
forked from yangence/DEBKS

a tool to detect differentially expressed circular RNA

License

Notifications You must be signed in to change notification settings

colinliuzelin/DEBKS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DEBKS: a tool to detect differentially expressed circular RNA

Introduction

DEBKS is a convenient and user-friendly program to streamline the discovery of differentially expressed circRNA between two RNA-seq sample groups with replicates. DEBKS combines well-known software CIRCexplorer2 for circRNA detection and annotation in chimeric RNA-seq reads, with rMATS statistical model for identifying differential isoform ratios using RNA-seq sequence count data with replicates.

Availability

DEBKS is a free software, which can be downloaded from https://github.com/yangence/DEBKS

Prequired Softwares and Packages

  1. Python 3.x.x and corresponding versions of NumPy, Pandas, and SciPy.

  2. STAR 2.6.1

  3. gtfToGenePred

  4. SAMtools 1.9

  5. CIRCexplorer2

Installation

Install latest release via conda

conda install -c colinliuzelin DEBKS

Install latest release from source codes

git clone https://github.com/yangence/DEBKS.git
cd DEBKS
pip install -r requirements.txt
python setup.py install

Required Files:

Users can prepare the external files under the following instructions:

  1. Genome fasta file

  2. Gene annotation GTF file

Usage

Raw Fastq File Provided

If raw fastq file provided, DEBKS can map reads and calcuate differential PBSI in the following command:

DEBKS -g genomeFasta -s1 s1File -s2 s2File  \
    -STARindex STARIndexDir -gtf gtfFile -o outDir \
    -read readType -len readLength [options]*

Required Files:

  1. s1File contains sample_1 fastq files:
  ${FILEPATH}/sample_1.Rep1.R1.fastq.gz;${FILEPATH}/sample_1.Rep1.R2.fastq.gz
  ${FILEPATH}/sample_1.Rep2.R1.fastq.gz;${FILEPATH}/sample_1.Rep2.R2.fastq.gz
  ${FILEPATH}/sample_1.Rep3.R1.fastq.gz;${FILEPATH}/sample_1.Rep3.R2.fastq.gz
  1. s2File contains sample_2 fastq files:
  ${FILEPATH}/sample_2.Rep1.R1.fastq.gz;${FILEPATH}/sample_2.Rep1.R2.fastq.gz
  ${FILEPATH}/sample_2.Rep2.R1.fastq.gz;${FILEPATH}/sample_2.Rep2.R2.fastq.gz
  ${FILEPATH}/sample_2.Rep3.R1.fastq.gz;${FILEPATH}/sample_2.Rep3.R2.fastq.gz
  1. Genome index built by STAR
STAR --runMode genomeGenerate --runThreadN threads \
  --genomeFastaFiles genomeFasta \
  --sjdbGTFfile gtfFile \
  --sjdbOverhang readLength-1 \
  --genomeDir STARIndexDir

Example

DEBKS -g hg19.fa -s1 sample_1.txt -s2 sample_2.txt -STARindex hg19_STAR/ \
    -gtf gencode.v19.annotation.gtf -o out_test -t 40 -read pair -len 150 -c 0.1 -a 6

STAR Alignment Results Provided

In this mode, users can map RNA-seq reads by themself with the following command:

STAR --genomeDir STARIndexDir --chimSegmentMin anchorLength \
    --runThreadN threads --outSAMtype BAM Unsorted --alignSJDBoverhangMin anchorLength \
    --alignSJoverhangMin anchorLength	--chimJunctionOverhangMin anchorLength \
    --outSJfilterOverhangMin -1 anchorLength -1 -1

Then, users can employ DEBKS to calcuate differential PBSI in the following command:

DEBKS -g genomeFasta \
    -s1CJ s1CJFile -s2CJ s2CJFile -s1SJ s1SJFile -s2SJ s2SJFile \
    -gtf gtfFile -o outDir -read readType -len readLength [options]*

Required Files:

  1. File contains sample_1 chimeric junction files from STAR output:
  ${FILEPATH}/sample_1.Rep1.Chimeric.out.junction
  ${FILEPATH}/sample_1.Rep2.Chimeric.out.junction
  ${FILEPATH}/sample_1.Rep3.Chimeric.out.junction
  1. File contains sample_2 chimeric junction files from STAR output:
  ${FILEPATH}/sample_2.Rep1.Chimeric.out.junction
  ${FILEPATH}/sample_2.Rep2.Chimeric.out.junction
  ${FILEPATH}/sample_2.Rep3.Chimeric.out.junction
  1. File contains sample_1 splicing junction files from STAR output:
  ${FILEPATH}/sample_1.Rep1.SJ.out.tab
  ${FILEPATH}/sample_1.Rep2.SJ.out.tab
  ${FILEPATH}/sample_1.Rep3.SJ.out.tab
  1. File contains sample_2 splicing junction files from STAR output:
  ${FILEPATH}/sample_2.Rep1.SJ.out.tab
  ${FILEPATH}/sample_2.Rep2.SJ.out.tab
  ${FILEPATH}/sample_2.Rep3.SJ.out.tab

Example

DEBKS -g genomeFasta -s1CJ sample_1.CJ.txt -s2CJ sample_2.CJ.txt -s1SJ sample_1.SJ.txt \
   -s2sJ sample_2.SJ.txt -gtf gencode.v19.annotation.gtf -o out_test  -t 40 -read pair -len 150 -c 0.1 -a 6

Required Parameters:

-g          <str>       Genome Fasta file

-STARindex  <str>       STAR alignment index directory

-s1         <str>       FASTQ files of sample 1, replicates in different lines, paired files are separated by semicolon

-s2         <str>       FASTQ files of sample 2, replicates in different lines, paired files are separated by semicolon

-s1CJ       <str>       Chimeric junction of sample 1 group, replicates in different lines

-s2CJ       <str>       Chimeric junction of sample 2 group, replicates in different lines

-s1SJ       <str>       Spliced junction of sample 1 group, replicates in different lines

-s2SJ       <str>       Spliced junction of sample 2 group, replicates in different lines

-gtf        <str>       GTF file

-o          <str>       Output directory of the result files

-read       <str>       RNA-seq reads are single- or pair-end reads. [single, pair]

-len        <int>       Read length of RNA-seq reads

Note: parameters -STARindex -s1 -s2 are mutually exclusive with -s1CJ -s2CJ -s1SJ -s2SJ

Optional Parameters:

-h, --help              Show this help message and exit

-p                      Sample 1 group and sample 2 group is paired

-n          <int>       Required total juction reads in all samples to filter out low expressed circRNAs [2*samples]

-t          <int>       Number of processors [1]

-c          <float>     Required PBSI difference cutoff between the two samples [0.01]

-a          <int>       Minimum overhang length for counting chimeric or splicing junctions [6]

-keepTemp               Keep the temporary files. Disable by default.

DEBKS Results Summary

Field Description
chr chromosome of circRNA
start coordinate of start back-splicied site (0-based)
end coordinate of end back-splicied site (1-based)
strand '+' or '-'
exonCount exon number of circRNA with comma-delimiter
exonSizes length for each exon with comma-delimiter
exonOffsets offset for each exon with comma-delimiter
geneID ID of gene
isoformID ID of isoform
flankIntron flanking intron of back-spliced sites
linearExonL coordinate of start site for left flanking exon
linearExonR coordinate of end site for right flanking exon
SJL1 counts of spliced junction of left flanking in sample 1 group with comma-delimiter
SJL1 counts of spliced junction of left flanking in sample 2 group with comma-delimiter
SJR1 counts of spliced junction of right flanking in sample 1 group with comma-delimiter
SJR1 counts of spliced junction of right flanking in sample 2 group with comma-delimiter
inc1 counts of inclusion spliced junction of sample 1 group with comma-delimiter, equal to sum of SJL1 and SJR1
inc2 counts of inclusion spliced junction of sample 2 group with comma-delimiter, equal to sum of SJL2 and SJR2
bs1 counts of back-spliced junction in sample 1 group
bs2 counts of back-spliced junction in sample 2 group
effective_inclusion_length length adjust for inc1 and inc2
effective_bs_length length adjust for bs1 and bs2
PBSI1 percent back-spliced in of sample 1
PBSI2 percent back-spliced in of sample 2
P the significance of differential PBSI with user defined threshold
FDR Benjamini-Hochberg corrected FDR of the above P

Citation

Copyright and License Information

Copyright (C) 2020 Zelin Liu ([email protected]). See the LICENSE file for license rights and limitations.

About

a tool to detect differentially expressed circular RNA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.1%
  • Perl 7.8%
  • R 3.5%
  • Shell 2.4%
  • Makefile 2.2%