Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

highest score is negative #34

Open
adamsorbie opened this issue Apr 30, 2021 · 4 comments
Open

highest score is negative #34

adamsorbie opened this issue Apr 30, 2021 · 4 comments

Comments

@adamsorbie
Copy link

Hi,

Firstly, thanks for the great tool, it's very helpful in choosing the cutoff parameters for DADA2.

I have a dataset which was very deeply sequenced and as far as I can tell the reads are not ideal quality. I ran some other qc checks alongside figaro and the output was a little strange.

{"trimPosition": [287, 270], "maxExpectedError": [51, 54], "readRetentionPercent": 76.1, "score": -5232.900874495484}
{"trimPosition": [286, 271], "maxExpectedError": [50, 55], "readRetentionPercent": 76.12, "score": -5240.876849894292}
{"trimPosition": [285, 272], "maxExpectedError": [49, 56], "readRetentionPercent": 76.14, "score": -5252.855227753219}
{"trimPosition": [284, 273], "maxExpectedError": [48, 57], "readRetentionPercent": 76.18, "score": -5268.8155871612535}

The maxExpected error values are very high and obviously the negative scores are also very strange. I'm guessing I must be doing something wrong here, but can't quite figure out what. Do you have any idea what would cause an output like this?

@michael-weinstein
Copy link
Collaborator

Ouch... that's one I haven't seen before. The simplest explanation for this would actually be poor read quality. Are you familiar with FASTQC? If so would you be able to give your reads a run through there any tell me what you see? If not, I can set up a zoom call and walk you though it.

@adamsorbie
Copy link
Author

adamsorbie commented May 3, 2021

Yeah I actually ran fastqc/multiqc before running figaro. I'm already aware the quality is far from ideal but I don't have much experience with reads which are poor quality unfortunately, so I don't have much intuition to go on regarding handling this.

This is the per base sequence quality from multiqc:
fastqc_per_base_sequence_quality_plot

edit: fyi, amplicon is V1-V3 507bp, sequencing PE 2 x 300.

@michael-weinstein
Copy link
Collaborator

That looks pretty bad. It would appear from this that by base 250, you're already looking at somewhere between 1 and 10% base call error. One question: are you including PhiX in this run? Would you be able to share your base frequency by position for this run graph? I think FastQC and multiqc produce that. Would you also be able to share the graphs generated by FIGARO?

@adamsorbie
Copy link
Author

It's actually published data that i'm re-analysing with ASVs instead of OTUs so I don't have all the information about the sequencing but I will ask around and see if anyone knows. From what I know the sequencing was performed by eurofins or GATC but unfortunately they don't give much information on their website about what calibrations they include.

Sure, the other plots are attached.
forwardExpectedError
reverseExpectedError

MultiQC unfortunately doesn't export that plot so I just attached a few examples from fastqc.
download (1)
download

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants