-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with LCA quorum #93
Comments
I'll have a look at it. Note on the side, you can write |
Thanks :) Yes, I used 1.7.0 but also tried 1.6.1 and 1.2.11 and the classification result was the same with all versions (did not check the nearest neighbors in details with the old versions, though). |
mind if I add the example to the unit tests? |
Simple enough. Classic int/float truncation issue. Should work now. I'm pushing 1.7.1, should be in bioconda within a day or so. |
No, go ahead. The submitted fix only fixes the difference between the second and third execution. The problem that the first test case
returns |
Darn - didn't see that part. Case of TL'DR I suppose. |
You are correct in your observation. The algorithm does an LCA between the best match and the others. Guess that was usually so close to what it's supposed to do that no-one notices. This will take me a day or two to fix. Don't have enough time in the evenings. |
Hi Elmar,
we stumbled across a weird problem when classifying sequences with SINA. One of our users searched for a partial sequences which as classified as
Archaea;Halobacterota;
despite eight of the ten nearest neighbours having a classification ofArchaea;Halobacterota;Methanomicrobia;Methanomicrobiales;Methanomicrobiaceae;Methanoculleus;
.I created a minimal working example with a SILVA dataset reduced to the ten nearest neighbours. When you execute
I then searched for a very similar full-length sequence in SILVA an ran the following query:
But when I change the the
--lcq-quorum
to0.79
I get the expected result:Between the 2nd and the 3rd example, there is a floating point precision problem. I think the ratio that is compared with the value of
lca_quorum
needs to be rounded according to the precision of the floating point type used.The difference between the first and the other two examples is that the first run returns one of the two sequences with a deviating classification of
Archaea;Halobacterota;Methanosarcinia;Methanosarciniales;Methanosarcinaceae;Methanosarcina;
as most similar neighbour whereas in the other two runs a sequence of the majority is selected. Thus, it seems that the LCA quorumalways tries to use the taxonomy of the most similar neighbour instead of trying find the most common classification that fulfils the quorum.
Could you have a look?
Thanks
Jan
minimal-example.tar.gz
The text was updated successfully, but these errors were encountered: