Question about use of calibrated aggregation scores #122

gweeenis · 2024-09-11T18:10:00Z

Hi Antonio,

I had a question about calibrated aggregated classification scores output by geNomad. If I run a metagenome assembly through genomad, would it be appropriate to use the calibrated aggregated scores of all the contigs that were used as input (including those with virus or plasmid scores below 0.7) to get an idea of how many contigs had "ambiguous" scores? For example, a contig with a plasmid score of 0.4 and a virus score of 0.6 that doesn't get classified as strictly viral or plasmid. I am trying to get a sense of the whole distribution of the contigs. Thanks.

apcamargo · 2024-09-12T18:01:12Z

Hi @gweeenis,

Yes, that makes sense. When you use --enable-score-calibration, the values represent approximate probabilities. For instance, a sequence with a plasmid score of 0.4 and a virus score of 0.6 has roughly a 40% chance of being a plasmid and a 60% chance of being a virus. However, keep in mind that the cutoffs for defining ambiguity can be somewhat arbitrary. It might be better to compute the entropy as a measure of ambiguity. For example:

Sequence	Chromosome score	Plasmid score	Virus score	Entropy
Sequence 1	0.2	0.6	0.2	0.636514
Sequence 2	0.0	0.6	0.4	1.098612

In this case, both sequences have the same maximum score (plasmid score = 0.6). However, the second sequence is more ambiguous than the first one because the probabilities of it being a virus or a chromosome are higher. This is quantified through Shannon entropy, which increases as the probabilities of different classes become more similar (maximum entropy is reached when all three scores are approximately 0.33). So, it may be more appropriate to base your decision of what constitutes an "ambiguous classification" on the entropy value rather than directly on the scores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about use of calibrated aggregation scores #122

Question about use of calibrated aggregation scores #122

gweeenis commented Sep 11, 2024

apcamargo commented Sep 12, 2024

Question about use of calibrated aggregation scores #122

Question about use of calibrated aggregation scores #122

Comments

gweeenis commented Sep 11, 2024

apcamargo commented Sep 12, 2024