-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid 'times' argument after simulate_experiment - works with subsetted dataset #65
Comments
My guess here is that some values of `readspertx` are zero or NA (but that
there aren't any zeros/NAs in the first 20 transcripts / small simulation).
Sadly this is a known limitation:
#28. However, you can work
around this by either removing the transcripts you don't want to simulate
from, or replacing all the 0 counts with some small number (like 1) to
approximate the simulation. Hope this helps!
…On Tue, Mar 26, 2019 at 4:39 AM orangebromeliad ***@***.***> wrote:
Hi team,
I'm trying to simulate RNAseq data for the malaria parasite P. berghei.
The transcript file has 5254 transcripts, and I am trying to simulate two
groups each with one replicate and no log fold changes between the two, for
now. The following script does not work, producing the error reproduced
below:
# FASTA annotation
fasta_file = '/Users/littlet/Downloads/PlasmoDB-42_PbergheiANKA_AnnotatedTranscripts.fasta'
fasta = readDNAStringSet(fasta_file)
# ~20x coverage ----> reads per transcript = transcriptlength/readlength * 20
# here all transcripts will have ~equal FPKM
readspertx = round(2 * width(fasta) / 100)
fold_changes = matrix(1,ncol=2, nrow=5254)
# simulation call:
simulate_experiment(fasta = fasta_file,
reads_per_transcript=readspertx,
fold_changes=fold_changes,
num_reps=c(1,1),
paired = T,
reportCoverage = T,
readlen = 100,
outdir='polyester_pberghei_simulated_reads_twosample')
Error in rep(seq_len(NROW(x)), ...) : invalid 'times' argument
In addition: Warning messages:
1: In rnbinom(n = length(basemeans), mu = basemeans, size = size) :
NAs produced
2: In rnbinom(n = length(basemeans), mu = basemeans, size = size) :
NAs produced
However when I use a subset of twenty transcripts the package appears to
work without producing any errors:
# FASTA annotation
fasta_file = '/Users/littlet/Downloads/PlasmoDB-42_PbergheiANKA_AnnotatedTranscripts.fasta'
fasta = readDNAStringSet(fasta_file)
# subset the FASTA file to first 20 transcripts
small_fasta = fasta[1:20]
writeXStringSet(small_fasta, 'berghei_small.fa')
# ~20x coverage ----> reads per transcript = transcriptlength/readlength * 20
# here all transcripts will have ~equal FPKM
readspertx = round(2 * width(small_fasta) / 100)
transcript.number = count_transcripts('berghei_small.fa')
fold_changes = matrix(1, ncol = 2, nrow=20)
# simulation call:
simulate_experiment(fasta = 'berghei_small.fa',
reads_per_transcript=readspertx,
fold_changes=fold_changes,
num_reps=c(1,1),
paired = T,
reportCoverage = T,
readlen = 100,
outdir='polyester_pberghei_simulated_reads_small')
I don't believe that this is a duplicate of the earlier closed issue: #8
<#8> since I'm quite
confident that the readspertx (reads per transcript) and fold_changes
objects are the right dimensions. Hence I have been unable to find what the
issue is.
Hope that this helps. Sessioninfo is below too:
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] SGSeq_1.16.2 Rsamtools_1.34.1 Biostrings_2.50.2 XVector_0.22.0
[5] polyester_1.18.0 tibble_2.1.1 gridExtra_2.3 tidyselect_0.2.5
[9] readxl_1.3.1 data.table_1.12.0 scales_1.0.0 RColorBrewer_1.1-2
[13] pheatmap_1.0.12 DESeq2_1.22.2 SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[17] BiocParallel_1.16.6 matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0
[21] GenomeInfoDb_1.18.2 IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0
[25] ggplot2_3.1.0 tidyr_0.8.3 dplyr_0.8.0.1
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-7 httr_1.4.0 progress_1.2.0
[5] tools_3.5.2 backports_1.1.3 utf8_1.1.4 R6_2.4.0
[9] rpart_4.1-13 Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.2
[13] colorspace_1.4-1 nnet_7.3-12 withr_2.1.2 prettyunits_1.0.2
[17] bit_1.1-14 compiler_3.5.2 cli_1.1.0 htmlTable_1.13.1
[21] rtracklayer_1.42.2 labeling_0.3 checkmate_1.9.1 genefilter_1.64.0
[25] stringr_1.4.0 digest_0.6.18 foreign_0.8-71 base64enc_0.1-3
[29] pkgconfig_2.0.2 htmltools_0.3.6 limma_3.38.3 htmlwidgets_1.3
[33] rlang_0.3.2 rstudioapi_0.10 RSQLite_2.1.1 acepack_1.4.1
[37] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.0 Formula_1.2-3
[41] Matrix_1.2-16 Rcpp_1.0.1 munsell_0.5.0 fansi_0.4.0
[45] stringi_1.4.3 logspline_2.1.12 yaml_2.2.0 zlibbioc_1.28.0
[49] plyr_1.8.4 grid_3.5.2 blob_1.1.1 crayon_1.3.4
[53] lattice_0.20-38 splines_3.5.2 GenomicFeatures_1.34.6 annotate_1.60.1
[57] hms_0.4.2 locfit_1.5-9.1 knitr_1.22 pillar_1.3.1
[61] igraph_1.2.4 RUnit_0.4.32 biomaRt_2.38.0 geneplotter_1.60.0
[65] reshape2_1.4.3 XML_3.98-1.19 glue_1.3.1 latticeExtra_0.6-28
[69] BiocManager_1.30.4 cellranger_1.1.0 gtable_0.3.0 purrr_0.3.2
[73] assertthat_0.2.1 xfun_0.5 xtable_1.8-3 survival_2.43-3
[77] GenomicAlignments_1.18.1 AnnotationDbi_1.44.0 memoise_1.1.0 cluster_2.0.7-1
Best,
Tim L.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#65>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB6e2Ct3PNF5h3esJ4Wp730AruYljTrmks5vagbdgaJpZM4cLObX>
.
|
Hi Alyssa, Yes you are right; it was because of the use of the Would it be worth inserting an error into the code to check for this an inform of the problem? Something like:
Best wishes and thank you for your help, |
Hi team,
I'm trying to simulate RNAseq data for the malaria parasite P. berghei. The transcript file has 5254 transcripts, and I am trying to simulate two groups each with one replicate and no log fold changes between the two, for now. The following script does not work, producing the error reproduced below:
However when I use a subset of twenty transcripts the package appears to work without producing any errors:
I don't believe that this is a duplicate of the earlier closed issue: #8 since I'm quite confident that the readspertx (reads per transcript) and fold_changes objects are the right dimensions. Hence I have been unable to find what the issue is.
Hope that this helps. Sessioninfo is below too:
Best,
Tim L.
The text was updated successfully, but these errors were encountered: