================= 2. Usage

=================

Install =================

Perl modular: PerlIO::gzip, ....\n Tophat2 Bowtie2 samtools

================= 2. Usage

2.1 aligning reads to reference

build file index using bowtie $bowite2-build /path/plant_genome.fa /path/plant_genome

make a list for all cleaned fasta files $ls *.fa > list

Then remove file suffix in list using VIM

Perform the fellowing analysis:

align the cleaned reads to genome
get No. of uniq mapped reads (make sure the cleaned reads locate at the same folder of bam files)
count the raw number for each gene
normalization using RPKM

$tophat_pipeline_v2.pl -i list -d /path/plant_genome -s SS -l fr-firststrand -p 8 -m 1 -n 1 / -x gene_position_file -a gene_length

Perform the below analysis step by step:

align the cleaned reads to genome

$tophat_pipeline_v2.pl -i list -d /path/plant_genome -s SS -l fr-firststrand -p 8 -m 1 -n 1

Here is the usage of tophat_pipeline.pl -i [String] Name of the input list of read files (required) -s [String] Sequencing method for read files: (required) PE (paired-end); SE (single-end); PS (paired strand-specific); SS (single strand-specific); -d [String] Indexed database of genome sequences (required) -l [String] Library type, fr-unstrand, fr-firststrand, fr-secondstrand (required) -p [Integer] number of CPUs used for megablast clustering (default = 1) -r [Integer] mate-inner-dist (default = 100) -v [Integer] mate-std-dev (default = 20) -m [Integer] segment mismatches (default = 0) -n [Integer] initial read mismatches (default = 0) -e [Integer] setment length (default : half of read length; 17~25) -c [Integer] cycle (default = 0, range: 0,1,2) -a (str) gene and position file (optional) -x (str) gene and length file (optional)

The required input includes list of read files, sequencing method, indexed genome sequences, and library type. The list is the prefix of read file. For example, if the read file name is WT_17D_rep2.fasta.gz, the list should be WT_17D_rep2. For paired-end read, the file name should be end with _1 or _2 before format info. For example, if the read file is test1_1.fq.gz and test1_2.fq.gz, the list is test1. Now, it supports gzip file with fasta or fastq format: ".gz", ".fa", ".fa.gz", ".fasta", ".fasta.gz", ".fq", ".fq.gz", ".fastq", ".fastq.gz"

The file name dose not affect the real format of read file, it a fastq file named as *.fasta, tophat_pipline.pl will identify the format automatically.

The segment mismatches should be not shorter than initial read mismatches. If not, the tophat_pipline.pl will fix it.

Please do not provide setment length, tophat_pipline.pl will set it automatically.

If gene position and gene length files are provided, tophat_pipline.pl will generate raw count and RPKM info. Otherwise, we can generate these info follow the steps from 2 to 4.
get No. of uniq mapped reads (make sure the cleaned reads locate at the same folder of bam files)

$get_uniq_mapped_read.pl -i list -s SS

Here is the usage of get_uniq_mapped_num.pl. -i|list (str) input read list file (required) -o|output (str) out file name (default = uniq_mapped_num) -s|sequencing-method (str) (required) (default = SS) PE (paired-end); SE (single-end); PS (paired strand-specific); SS (single strand-specific);

The input list file is the same file of step 1. The sequencing-method must be same as step 1.
count the raw number for each gene

$get_exp_raw.pl -i list -s SS -a gene_position_file

Here is the usage of get_exp_raw.pl. -i|list (str) input read list file (required) -a|gene-position (str) gene position file (required) -o|output (str) prefix of out files (default = exp; exp_sense_raw, exp_antisense_raw) -s|sequencing-method (str) (required) (default = SS) PE (paired-end); SE (single-end); PS (paired strand-specific); SS (single strand-specific); -h|?|help help info

The input list file is the same file of step 1. The sequencing-method must be same as step 1.

The gene position file is a tab-delimited file, format is like below: chr start end gene strand (the titile is not included) chr1 100 1000 Gene1 +
normalization using RPKM

$get_exp_rpkm.pl -e exp_sense_raw -u uniq_mapped_read_num -x tomato_gene_length

Here is the usage of get_exp_rpkm.pl. -x|gene-length (str) gene length -e|exp-adjust (str) adjusted expression -o|output (str) output (default = exp_rpkm)

File uniq_mapped_num is the output of get_uniq_mapped_read.

Gene length file is a tab-delimited file, format is like below: GeneID Length (the titile is not included) Gene001 1000
Get correlation for each samples

$corre.pl exp_rpkm > correlation.txt

2.2 Perform statistics analysis for differentially expressed genes

Prepare comparison file for pairwise comparison

Samples: C_rep1, C_rep2, C_rep3, T_rep1, T_rep2, T_rep3
Comparison_list File: C\tT\n

Pairwise statistics analysis using DESeq

$DESeq_pipeline.pl  raw_count  rpkm_file  comparison_list  output

Pairwise statistics analysis using edgeR

$edgeR_pipeline.pl  raw_count  rpkm_file  comparison_list  output

Prepare comparison file for time-series comparison

Samples: T1h_rep1, T1h_rep2, T6h_rep1, T6h_rep2, T12h_rep1, T12h_rep2
Comparison_list File: T1h\tT6h\tT12h\n

Time-series statistics analysis using DESeq and LIMMA

$TSlimma_pipeline.pl  raw_count  rpkm_file  comparison_list  output

================= 3. Other Tools

3.1 Prepare gene position file using GFF or GTF file In fact, using the representative transcriptome is better,

3.2 Get unmapped reads

$get_unmapped_read.pl -i list -s SS

3.3 Check the mapping rate to different Plant Genomes using bowtie

$check_UBowtie.pl -i list -s SS -p CPU -d genome_bowtie_index -n genome_name

3.4 Parse report file for rRNA removing

$parse_rRNA_rm_rpt.pl rRNA_rm.report > output

================= 4. Analysing Grafting RNASeq Datasets

4.1 Aligning reads to scion sequence

$tophat_pipeline_v2.pl -i list -d /path/scion_sequnece -s PE -l fr-unstranded -p 8 -m 1 -n 1

4.2 Get unmapped reads

$get_unmapped_read.pl -i list -s PE

4.3 Aligning unmapped reads to grated sequence

$tophat_pipeline_v2.pl -i list_unmap -d /path/grated_sequnece -s PE -l fr-unstranded -p 8 -m 0 -n 0

4.4 Count the raw number for each gene & Get mapped reads and corresponding genes with fasta format

$get_exp_raw.pl -i list_unmap -s PE -a grated_gene_position -r

4.5 Filter the raw count (generated at step 4.4) without read support

$filter_raw_count_graft.pl raw_count > output

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
bin		bin
DEG_pipeline.pl		DEG_pipeline.pl
PE_align.pl		PE_align.pl
README.md		README.md
TSlimma_pipeline.pl		TSlimma_pipeline.pl
check_UBowtie.pl		check_UBowtie.pl
corre.pl		corre.pl
filter_raw_count_graft.pl		filter_raw_count_graft.pl
get_exp_norm.pl		get_exp_norm.pl
get_exp_raw.pl		get_exp_raw.pl
get_exp_rpkm.pl		get_exp_rpkm.pl
get_uniq_mapped_num.pl		get_uniq_mapped_num.pl
get_uniq_read.pl		get_uniq_read.pl
get_unmapped_read.pl		get_unmapped_read.pl
gff2gene_position.pl		gff2gene_position.pl
parse_rRNA_rm_rpt.pl		parse_rRNA_rm_rpt.pl
rRNA_rm.pl		rRNA_rm.pl
split_get_exp_raw.pl		split_get_exp_raw.pl
tophat_pipeline_v1.pl		tophat_pipeline_v1.pl
tophat_pipeline_v2.pl		tophat_pipeline_v2.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

================= 2. Usage

================= 3. Other Tools

================= 4. Analysing Grafting RNASeq Datasets

About

Releases

Packages

Languages

kentnf/RSQuantification

Folders and files

Latest commit

History

Repository files navigation

================= 2. Usage

================= 3. Other Tools

================= 4. Analysing Grafting RNASeq Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages