-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strand specific argument #40
Comments
Hi Ping-Han, When you aligned your reads with bowtie (to create the @mikelove, do you have a sense of whether something might be going on with the strand-specific simulation? I can dig in when I get a few more spare moments, but thought I'd see if anything came to mind. |
@dn070017 Can you provide |
The argument 'strand_specific' is in the devel branch (Polyester 1.9), this is release. Check the help pages in R, and you should see that 'stand_specific' is not listed in the Details: ?simulate_experiment |
Hello, developers,
Recently, I'm trying to perform a simulation with strand specific reads.
Because I can't find any information about what kind of library will be constructed, so I decided to do a quick test with yeast cdna.
The code I used are basically shown below (but for memory issue, I have to split the original reference cdna into many different files):
fasta_count = count_transcripts('test.fasta')
fasta = readDNAStringSet('test.fasta')
writeXStringSet('test.fasta', 'test.file')
readspertx = round(50 * (width(fasta) / 200))
fold_change = data.frame(value=matrix(sample(seq(from=0.01, to=10.00, by=0.01), fasta_count, replace=T), fasta_count))
simulate_experiment('test.file', reads_per_transcript=readspertx, num_reps=c(1,1), fold_changes=fold_change, error_model='illumina5', bias='cdnaf', outdir=folder, strand_specific=T)
The simulation was successful without any error message.
Then I mapped the reads back to the reference cdna and tried to determine the expression profile of each sequences. (using bowtie2 and express)
Because I need to check what kind of libraries were used in the simulation model, I ran express on three different ways (which only calculate the reads with specific library protocols)
express -o ./sample -m 250 -s 25 reference.fasta sample.bam (unstranded)
express -o ./sample_fr -m 250 -s 25 --fr-stranded reference.fasta sample.bam (fr-stranded)
express -o ./sample_rf -m 250 -s 25 --rf-stranded reference.fasta sample.bam (rf-stranded)
What surprised me is the total count or effective count for sample_fr and sample_rf are only half of the unstranded one
The result indicate that there are 50% of reads are fr-stranded, and others are rf-stranded, which suggest the simulation still produce unstranded reads.
I'm wondering if I use the program in a wrong way or do I misunderstand anything.
By the way, thanks for this amazing tools! it really helps me a lot on my research
Best,
Ping-Han
The text was updated successfully, but these errors were encountered: