-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chimera creation #28
Comments
Hello, Best regards, |
Hi,
I have a pacbio assembly of the exact same animal. I map both to the
standard reference assembly (cattle), the pacbio matches well but the
ont/raven clearly doesn't. I am using nucmer to map and visualize. I am
happy to send the figures for a chromosome if it helps.
N50 in standard mode is 3.5gb, with weaken is 0.9gb. longest contig is 18gb
and 5gb for the two assemblies respectively.
79gb is sequence, so fasta equivalent (fq is twice the size).
The pores are the current R9 versions that have been out for a while now,
and guppy is 4.2.2 running on GPUs.
I haven't mapped the "weaken" version yet, so I am not sure if there are
the chimeras in that version.
I also have around the same amount of data for another animal, which is
actually the daughter of the first one, I am hoping to put them together to
get a better final assembly, validate SV etc.
Cheers
Liz
|
Is the assembly size near 3GB? Do you have the log created by default Raven? Using option weaken might be a bad idea here, it was only tested a bit for HiFi reads. Please paste some figures here or send them via email, thanks. |
Sorry I see above I wrote gb instead of mb for the n50 and longest contig,
those values are mb.
Yes the total size is approximately correct,
Pacbio assembly: 2,636,494,505
Raven: 2,634,109,219
Raven --weaken: 2,596,244,275
I will email some figures tomorrow morning (night time here at the moment).
Cheers.
…On Mon, Dec 14, 2020, 11:32 PM Robert Vaser ***@***.***> wrote:
Is the assembly size near 3GB? Do you have the log created by default
Raven? Using option weaken might be a bad idea here, it was only tested a
bit for HiFi reads. Please paste some figures here or send them via email,
thanks.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#28 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIFXAFM543LMGOG2J5GAJKDSUYHX5ANCNFSM4U2BFWLQ>
.
|
Thanks Liz. This looks chimeric given that the reference is appropriate for your dataset. What is the NG50 of the PacBio assembly? Have you tried any other assembler? Do you perhaps have the log Raven outputted? It would help me see if something went wrong during the assembly. |
The PacBio assembly is now scaffolded to full chromosome length, but the contigs for that assembly were ~N50=11MB. I haven't tried any other assemblers on this ont dataset yet. |
Thanks for the log. Is it from --weaken run or default? Does the dataset contain ultra long reads? |
It does contain some very long reads - although what is long can be a bit subjective - 5 reads are between 800kb and 900KB. |
The average number of overlaps per surviving read is 2 for this run, not sure if that is small or not. Do you perhaps have the default run log? |
Hi Raven team,
Thanks for the tool, I really like how fast it is. I have run raven on a 3gb mammalian genome with 79GB of ONT data. It runs fine but it looks like it is creating a fairly large number of chimeric contigs. I suspect that this could be resolved by adjusting the settings but there seems to be no way to do this..... obviously this causes incorrect N50 values etc.
Is there a way to adjust the overlap settings other then the 'weaken' true false switch? I can turn that on but of course the N50 drops dramatically and it would be nice to be able to find the sweet spot for my data.
Cheers.
The text was updated successfully, but these errors were encountered: