-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many sequences below identity #3
Comments
Since the data set size is very low (500 sequences), an easy workaround for now may be to provide alignment scores via It seems highly peculiar that all the sequences with very high similarity are showing up as negatives. If you don't mind, can I have a sample of the data? |
Ok, thanks! Attached is the full 500 simulated dataset. I'll try on a larger simulation and let you know. |
I get the same error message for datasets with the same simulation parameters but with 2500 and 12500 sequences as well. When I try with the parameter
|
Ok I had success, I manually specified the identity parameter to be anything above 0.9 and it worked. The alignment error message is a known bug, and I have been working on it. |
Great that works, thanks! |
Hi again,
I tried running MeShClust on 500 sequences that I simulated, all of length ~900nucleotides with most of the sequences highly similar (edit distances 1-20bp). A small portion of these sequences might have a high error rate, roughly Pacbios error rate of 10-15%. This is suppose to mimic PacBio Iso-Seq data. Any idea on how I should run MeShClust on such a dataset? Is it suitable for such sequences?
Thanks for your help!
The text was updated successfully, but these errors were encountered: