strand specific argument #40

dn070017 · 2016-09-22T10:02:53Z

Hello, developers,

Recently, I'm trying to perform a simulation with strand specific reads.

Because I can't find any information about what kind of library will be constructed, so I decided to do a quick test with yeast cdna.

The code I used are basically shown below (but for memory issue, I have to split the original reference cdna into many different files):

fasta_count = count_transcripts('test.fasta')
fasta = readDNAStringSet('test.fasta')
writeXStringSet('test.fasta', 'test.file')
readspertx = round(50 * (width(fasta) / 200))
fold_change = data.frame(value=matrix(sample(seq(from=0.01, to=10.00, by=0.01), fasta_count, replace=T), fasta_count))
simulate_experiment('test.file', reads_per_transcript=readspertx, num_reps=c(1,1), fold_changes=fold_change, error_model='illumina5', bias='cdnaf', outdir=folder, strand_specific=T)

The simulation was successful without any error message.

Then I mapped the reads back to the reference cdna and tried to determine the expression profile of each sequences. (using bowtie2 and express)

Because I need to check what kind of libraries were used in the simulation model, I ran express on three different ways (which only calculate the reads with specific library protocols)

express -o ./sample -m 250 -s 25 reference.fasta sample.bam (unstranded)
express -o ./sample_fr -m 250 -s 25 --fr-stranded reference.fasta sample.bam (fr-stranded)
express -o ./sample_rf -m 250 -s 25 --rf-stranded reference.fasta sample.bam (rf-stranded)

What surprised me is the total count or effective count for sample_fr and sample_rf are only half of the unstranded one
The result indicate that there are 50% of reads are fr-stranded, and others are rf-stranded, which suggest the simulation still produce unstranded reads.

I'm wondering if I use the program in a wrong way or do I misunderstand anything.

By the way, thanks for this amazing tools! it really helps me a lot on my research

Best,
Ping-Han

The text was updated successfully, but these errors were encountered:

alyssafrazee · 2016-09-28T04:59:50Z

Hi Ping-Han,

When you aligned your reads with bowtie (to create the sample.bam) files, did you provide a strand-specific argument there? From what I remember, you do have to tell bowtie that your reads are stranded (it is not the default and it doesn't infer it from the fasta input files). That's the only thing I can think of that might be going wrong outside of polyester.

@mikelove, do you have a sense of whether something might be going on with the strand-specific simulation? I can dig in when I get a few more spare moments, but thought I'd see if anything came to mind.

mikelove · 2016-09-28T13:55:43Z

@dn070017 Can you provide sessionInfo()? Also, my first check would be to just look at the aligned reads with IGV. You can color by strand.

dn070017 · 2016-09-29T15:50:04Z

The sessionInfo() are described as follow:
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] polyester_1.8.3 Biostrings_2.40.2 XVector_0.12.1
[4] IRanges_2.6.1 S4Vectors_0.10.3 BiocGenerics_0.18.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.18.0

I used bwa (which estimate the orientation and insert size on its own) to do the alignment as well. By the way, in my own experience, I think the mapper such as bwa or bowtie2 do not take the library type into account. They seems to consider only the orientation of the reads to be whether inward (PE) or outward (MP).
After converting the result to sam format and sorting appropriately, I passed the result to IGV. Here's the command I used:
bwa mem -t 20 ../../yeast_cdna/Saccharomyces_cerevisiae_cdna_500_bwa ./sample_01_1.fastq ./sample_01_2.fastq > sample_01.sam
samtools view -@ 20 -b sample_01.sam | samtools sort - -o sample_01.bam
samtools index sample_01.bam sample_01.bai

The result are shown as follows:

I've sorted the alignments by which strand they were aligned. So the alignments above the red line were aligned to the negative strand of cdna, those below the red line were aligned to the positive strand.
And I also colored the alignments by the first-of-pair strand. For the alignments mapped to the negative strand of cdna, the purple ones suggest they are the first read in the pair (w/ negative insert size), and the red ones the second read in the pair. However, when it comes to the alignments mapped on the positive strand of cdna, the red ones suggest they are the first read in the pair, and the purple ones the second.
The result that I expect is the reads which are first read in the pair should always aligned to specific strand of cdna, and the second read in the pair to another. (e.g for library like fr-firststrand, the first reads in the pair should always mapped on the negative strand of cdna or transcript)

I'm still new to bioinformatics, so if I misunderstand anything, just feel free to let me know. Thanks for the reply.

mikelove · 2016-09-29T15:53:28Z

The argument 'strand_specific' is in the devel branch (Polyester 1.9), this is release.

Check the help pages in R, and you should see that 'stand_specific' is not listed in the Details:

?simulate_experiment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strand specific argument #40

strand specific argument #40

dn070017 commented Sep 22, 2016 •

edited

Loading

alyssafrazee commented Sep 28, 2016

mikelove commented Sep 28, 2016

dn070017 commented Sep 29, 2016 •

edited

Loading

mikelove commented Sep 29, 2016 •

edited

Loading

strand specific argument #40

strand specific argument #40

Comments

dn070017 commented Sep 22, 2016 • edited Loading

alyssafrazee commented Sep 28, 2016

mikelove commented Sep 28, 2016

dn070017 commented Sep 29, 2016 • edited Loading

mikelove commented Sep 29, 2016 • edited Loading

dn070017 commented Sep 22, 2016 •

edited

Loading

dn070017 commented Sep 29, 2016 •

edited

Loading

mikelove commented Sep 29, 2016 •

edited

Loading