Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize simulate_experiment()? #32

Open
kcha opened this issue Feb 17, 2016 · 0 comments
Open

Parallelize simulate_experiment()? #32

kcha opened this issue Feb 17, 2016 · 0 comments

Comments

@kcha
Copy link

kcha commented Feb 17, 2016

Hi,

Thanks for this useful package! I was wondering if there were any plans to parallelize read simulation?

I noticed that it might be possible to parallelize the outer for loop in sgreg(). I tried replacing the for loop with foreach from the DoMC package.
It was a quick change and although I didn't do extensive testing, it seems to speed things up significantly when more than one replicate or group is being simulated (see: kcha/polyester@7b6c31e).

Interested in hearing your thoughts!

library(polyester)
library(doMC)

fold_changes = matrix(c(1, 1), nrow = 1)

for (c in c(1,4,8)) {
  t <- system.time(
    simulate_experiment('data/toy.fa', 
                        readlen = 100,
                        reads_per_transcript = 10000,
                        fold_changes = fold_changes,
                        num_reps=c(4, 4), 
                        outdir='simulated_reads/single',
                        distr="empirical",
                        error_model = "illumina5",
                        paired=FALSE,
                        gzip=TRUE, cores = c) 
  )
  print(paste("Cores:", c))
  print(t)
}
[1] "Cores: 1"
   user  system elapsed 
 27.032   0.974  28.075 
[1] "Cores: 4"
   user  system elapsed 
 22.472   0.842   7.969 
[1] "Cores: 8"
   user  system elapsed 
 49.123   2.340   7.094 
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] doMC_1.3.4      iterators_1.0.8 foreach_1.4.3   polyester_1.7.1

loaded via a namespace (and not attached):
 [1] compiler_3.2.3      zlibbioc_1.14.0     limma_3.24.15      
 [4] IRanges_2.2.9       tools_3.2.3         XVector_0.8.0      
 [7] logspline_2.1.9     Biostrings_2.36.4   codetools_0.2-14   
[10] S4Vectors_0.6.6     BiocGenerics_0.14.0 stats4_3.2.3 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant