Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chimera creation #28

Open
LizzieMcDizzie opened this issue Dec 14, 2020 · 10 comments
Open

Chimera creation #28

LizzieMcDizzie opened this issue Dec 14, 2020 · 10 comments

Comments

@LizzieMcDizzie
Copy link

Hi Raven team,
Thanks for the tool, I really like how fast it is. I have run raven on a 3gb mammalian genome with 79GB of ONT data. It runs fine but it looks like it is creating a fairly large number of chimeric contigs. I suspect that this could be resolved by adjusting the settings but there seems to be no way to do this..... obviously this causes incorrect N50 values etc.

Is there a way to adjust the overlap settings other then the 'weaken' true false switch? I can turn that on but of course the N50 drops dramatically and it would be nice to be able to find the sweet spot for my data.

Cheers.

@rvaser
Copy link
Collaborator

rvaser commented Dec 14, 2020

Hello,
at the moment there are not many parameters you can tweak from the outside, I will try and add them in the upcoming version. May I ask how did you assess the assembly? What is the NGA50 value? Is 79GB size of the FASTQ or FASTA file? Which pore/basecaller version were used?

Best regards,
Robert

@LizzieMcDizzie
Copy link
Author

LizzieMcDizzie commented Dec 14, 2020 via email

@rvaser
Copy link
Collaborator

rvaser commented Dec 14, 2020

Is the assembly size near 3GB? Do you have the log created by default Raven? Using option weaken might be a bad idea here, it was only tested a bit for HiFi reads. Please paste some figures here or send them via email, thanks.

@LizzieMcDizzie
Copy link
Author

LizzieMcDizzie commented Dec 15, 2020 via email

@LizzieMcDizzie
Copy link
Author

brahchr1_1_fetal_longest_contig
brahchr3_1_fetal_longest_contig

This is my longest contig from an assembly with n50 ~9mb - this the the pattern I see in what i call chimeras - clear alignments to different chromosomes. This one had the weaken flag on - but its the same pattern without it.

Cheers,
Liz

@rvaser
Copy link
Collaborator

rvaser commented Dec 16, 2020

Thanks Liz. This looks chimeric given that the reference is appropriate for your dataset. What is the NG50 of the PacBio assembly? Have you tried any other assembler? Do you perhaps have the log Raven outputted? It would help me see if something went wrong during the assembly.

@LizzieMcDizzie
Copy link
Author

LizzieMcDizzie commented Dec 16, 2020

The PacBio assembly is now scaffolded to full chromosome length, but the contigs for that assembly were ~N50=11MB.
By log do you mean the information that Raven streams to the sterr?
ravenrunner.scr.e175487.txt

I haven't tried any other assemblers on this ont dataset yet.

@rvaser
Copy link
Collaborator

rvaser commented Dec 16, 2020

Thanks for the log. Is it from --weaken run or default? Does the dataset contain ultra long reads?

@LizzieMcDizzie
Copy link
Author

It does contain some very long reads - although what is long can be a bit subjective - 5 reads are between 800kb and 900KB.
that log is from a --weaken run

@rvaser
Copy link
Collaborator

rvaser commented Dec 17, 2020

The average number of overlaps per surviving read is 2 for this run, not sure if that is small or not. Do you perhaps have the default run log?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants