Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: maximum supported read length for this version = 1024 #5

Open
tseemann opened this issue Sep 8, 2016 · 9 comments
Open

Error: maximum supported read length for this version = 1024 #5

tseemann opened this issue Sep 8, 2016 · 9 comments

Comments

@tseemann
Copy link

tseemann commented Sep 8, 2016

I can't seem to assemble my data, 5 Mbp bacterial genome, PE reads. I've tried various k and g etc.

LightAssembler -k 31 -G 5000000 -t 72 /data/R1.fq.gz /data/R2.fq.gz --verbose
--- Parameters extrapolation.

--- h(0):m(0):s(8) elapsed time.
--- start with gap size g = 8
--- average read length = 137
--- average sequencing coverage = 131

--- Uniform kmers sampling.

--- h(0):m(0):s(0) elapsed time.
--- total number of kmers in BloomA = 0
--- BloomA false positive rate = 0
--- probability of an incorrect kmer appears in the sample : 0.151046

--- Trusted/untrusted kmers filtering.

--- h(0):m(0):s(0) elapsed time.
--- total number of kmers in BloomB = 0
--- BloomB false positive rate = 0
--- LightAssembler can not assemble your dataset !!!
--- maximum supported read length for this version = 1024
--- try different values for k [kmer size] & g [gap size] or different dataset
@tseemann
Copy link
Author

tseemann commented Sep 8, 2016

I think the bug is that you do not support read files with path in them?

So ecoli.fastq.gz works, but not /path/to/the/reads/ecoli.fastq.gz ?

@SaraEl-Metwally
Copy link
Owner

screenshot from 2016-09-30 04_32_23

@SaraEl-Metwally
Copy link
Owner

As you can see, LightAssembler supports the path to read files.
Thanks!

@tseemann
Copy link
Author

tseemann commented Oct 2, 2016

The path suggestion was just one idea I had.

Can you suggest any other reasons why we are unable to get any results with your software?

@SaraEl-Metwally
Copy link
Owner

Can you give me the exact command line that you are using for your dataset?

@michaelbarton
Copy link

I believe @jfroula was having the same problem running the software here at the JGI. Jeff, perhaps you could outline the problem you were having, if you have your code samples at hand?

@michaelbarton
Copy link

My experience is that this appears to be related to the -G flag. If the value is not set to an accurate value. I've found using 10x the anticipated value appears to make this error go away. Assuming we're describing the same error cause. LightAssembler appears to generate the same error message, regardless of the cause.

@SaraEl-Metwally
Copy link
Owner

SaraEl-Metwally commented Feb 16, 2017

Sorry for late reply,
@michaelbarton, The value of -G flag, the genome size, should be relatively accurate because it plays a key role in determining the size of Bloom filter, its false positive rate, which affects trusted/untrusted kmers filtering step of LightAssembler (i.e. LightAssembler results).
I tried different genome size values for GAGE Staphylococcus_aureus (genome size: 2903081 bp) to see the effect of genome size values on the assembly results.
(genome size: 1803081 bp)

screenshot from 2017-02-16 05-08-16

(genome size: 1103081 bp)
screenshot from 2017-02-16 10-12-41

LightAssembler generates a general message if it fails to assemble the given data set saying some suggestions that cause the failure such as read length, gap size or kmer size. I will also mention that the genome size value should be relatively accurate in this generated message. I sent an email to @jfroula to know his issues with LightAssembler so I can fix them.

Thank you so much.

@michaelbarton
Copy link

Thanks for following up. I believe it may not be possible to have an accurate estimate of the genome size ahead of time, for example when assembling a novel genome for the first time. It can be possible to approximate size from the observation rate of unique kmers when sampling from the reads however this could be error prone if light assembler is particularly sensitive to this value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants