Chimera creation #28

LizzieMcDizzie · 2020-12-14T03:01:29Z

Hi Raven team,
Thanks for the tool, I really like how fast it is. I have run raven on a 3gb mammalian genome with 79GB of ONT data. It runs fine but it looks like it is creating a fairly large number of chimeric contigs. I suspect that this could be resolved by adjusting the settings but there seems to be no way to do this..... obviously this causes incorrect N50 values etc.

Is there a way to adjust the overlap settings other then the 'weaken' true false switch? I can turn that on but of course the N50 drops dramatically and it would be nice to be able to find the sweet spot for my data.

Cheers.

rvaser · 2020-12-14T09:54:57Z

Hello,
at the moment there are not many parameters you can tweak from the outside, I will try and add them in the upcoming version. May I ask how did you assess the assembly? What is the NGA50 value? Is 79GB size of the FASTQ or FASTA file? Which pore/basecaller version were used?

Best regards,
Robert

LizzieMcDizzie · 2020-12-14T12:44:52Z

Hi, I have a pacbio assembly of the exact same animal. I map both to the standard reference assembly (cattle), the pacbio matches well but the ont/raven clearly doesn't. I am using nucmer to map and visualize. I am happy to send the figures for a chromosome if it helps. N50 in standard mode is 3.5gb, with weaken is 0.9gb. longest contig is 18gb and 5gb for the two assemblies respectively. 79gb is sequence, so fasta equivalent (fq is twice the size). The pores are the current R9 versions that have been out for a while now, and guppy is 4.2.2 running on GPUs. I haven't mapped the "weaken" version yet, so I am not sure if there are the chimeras in that version. I also have around the same amount of data for another animal, which is actually the daughter of the first one, I am hoping to put them together to get a better final assembly, validate SV etc. Cheers Liz

rvaser · 2020-12-14T13:32:31Z

Is the assembly size near 3GB? Do you have the log created by default Raven? Using option weaken might be a bad idea here, it was only tested a bit for HiFi reads. Please paste some figures here or send them via email, thanks.

LizzieMcDizzie · 2020-12-15T11:30:05Z

Sorry I see above I wrote gb instead of mb for the n50 and longest contig, those values are mb. Yes the total size is approximately correct, Pacbio assembly: 2,636,494,505 Raven: 2,634,109,219 Raven --weaken: 2,596,244,275 I will email some figures tomorrow morning (night time here at the moment). Cheers.

…

On Mon, Dec 14, 2020, 11:32 PM Robert Vaser ***@***.***> wrote: Is the assembly size near 3GB? Do you have the log created by default Raven? Using option weaken might be a bad idea here, it was only tested a bit for HiFi reads. Please paste some figures here or send them via email, thanks. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIFXAFM543LMGOG2J5GAJKDSUYHX5ANCNFSM4U2BFWLQ> .

LizzieMcDizzie · 2020-12-16T09:09:18Z

This is my longest contig from an assembly with n50 ~9mb - this the the pattern I see in what i call chimeras - clear alignments to different chromosomes. This one had the weaken flag on - but its the same pattern without it.

Cheers,
Liz

rvaser · 2020-12-16T09:55:53Z

Thanks Liz. This looks chimeric given that the reference is appropriate for your dataset. What is the NG50 of the PacBio assembly? Have you tried any other assembler? Do you perhaps have the log Raven outputted? It would help me see if something went wrong during the assembly.

LizzieMcDizzie · 2020-12-16T10:08:19Z

The PacBio assembly is now scaffolded to full chromosome length, but the contigs for that assembly were ~N50=11MB.
By log do you mean the information that Raven streams to the sterr?
ravenrunner.scr.e175487.txt

I haven't tried any other assemblers on this ont dataset yet.

rvaser · 2020-12-16T14:18:08Z

Thanks for the log. Is it from --weaken run or default? Does the dataset contain ultra long reads?

LizzieMcDizzie · 2020-12-17T04:58:03Z

It does contain some very long reads - although what is long can be a bit subjective - 5 reads are between 800kb and 900KB.
that log is from a --weaken run

rvaser · 2020-12-17T06:58:28Z

The average number of overlaps per surviving read is 2 for this run, not sure if that is small or not. Do you perhaps have the default run log?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chimera creation #28

Chimera creation #28

LizzieMcDizzie commented Dec 14, 2020

rvaser commented Dec 14, 2020

LizzieMcDizzie commented Dec 14, 2020 via email •

edited by rvaser

Loading

rvaser commented Dec 14, 2020

LizzieMcDizzie commented Dec 15, 2020 via email

LizzieMcDizzie commented Dec 16, 2020

rvaser commented Dec 16, 2020

LizzieMcDizzie commented Dec 16, 2020 •

edited

Loading

rvaser commented Dec 16, 2020 •

edited

Loading

LizzieMcDizzie commented Dec 17, 2020

rvaser commented Dec 17, 2020

Chimera creation #28

Chimera creation #28

Comments

LizzieMcDizzie commented Dec 14, 2020

rvaser commented Dec 14, 2020

LizzieMcDizzie commented Dec 14, 2020 via email • edited by rvaser Loading

rvaser commented Dec 14, 2020

LizzieMcDizzie commented Dec 15, 2020 via email

LizzieMcDizzie commented Dec 16, 2020

rvaser commented Dec 16, 2020

LizzieMcDizzie commented Dec 16, 2020 • edited Loading

rvaser commented Dec 16, 2020 • edited Loading

LizzieMcDizzie commented Dec 17, 2020

rvaser commented Dec 17, 2020

LizzieMcDizzie commented Dec 14, 2020 via email •

edited by rvaser

Loading

LizzieMcDizzie commented Dec 16, 2020 •

edited

Loading

rvaser commented Dec 16, 2020 •

edited

Loading