Tool: UNIQmin
- Protocol paper
- UNIQmin, an alignment-free tool to study viral sequence diversity across taxonomic lineages: a case study of monkeypox virus
- Application papers
- UNIQmin application to all viruses
- UNIQmin application to SARS-CoV-2
- Negligible peptidome diversity of SARS-CoV-2 and its higher taxonomic ranks
- Citing resources
- Found a bug
UNIQmin, an alignment-free tool to study viral sequence diversity across taxonomic lineages: a case study of monkeypox virus
Click to view the description of the project
Sequence changes in viral genomes generate protein sequence diversity that enable viruses to evade the host immune system, hindering the development of effective preventive and therapeutic interventions. Massive proliferation of sequence data provides unprecedented opportunities to study viral adaptation and evolution. Alignment-free approach removes various restrictions, otherwise posed by an alignment-dependent approach for the study of sequence diversity. The publicly available tool, UNIQmin offers an alignment-free approach for the study of viral sequence diversity at any given rank of taxonomy lineage and is big data ready. The tool performs an exhaustive search to determine the minimal set of sequences required to capture the peptidome diversity within a given dataset. This compression is possible through the removal of identical sequences and unique sequences that do not contribute effectively to the peptidome diversity pool. Herein, we describe a detailed four-part protocol (BP1-4) utilizing UNIQmin to generate the minimal set for the purpose of viral diversity analyses at any rank of the taxonomy lineage, using Monkeypox virus (MPX) as a case study. These protocols enable systematic diversity studies across the taxonomic lineage, which are much needed for our future preparedness of a viral epidemic, in particular when data is in abundance and freely available.
Basic Protocols 1, 2 & 3:
Basic Protocol 4:
Click to view the description of the project
The unprecedented increase in SARS-CoV-2 sequence data limits the application of alignment-dependent approaches to study viral diversity. Herein, we applied our recently published UNIQmin, an alignment-free tool to study the protein sequence diversity of SARS-CoV-2 (sub-species) and its higher taxonomic lineage ranks (species, genus, and family). Only less than 0.5% of the reported SARS-CoV-2 protein sequences are required to represent the inherent viral peptidome diversity, which only increases to a mere ~2% at the family rank. This is expected to remain relatively the same even with further increases in the sequence data. The findings have important implications in the design of vaccines, drugs, and diagnostics, whereby the number of sequences required for consideration of such studies is drastically reduced, short-circuiting the discovery process, while still providing for a systematic evaluation and coverage of the pathogen diversity.
Compression of SARS-CoV-2 datasets across taxonomy lineage ranks, namely sub-species (proteins), species (with and without SARS-CoV-2), genus, and family:
Note: All data were retrieved as of July 2021.
Note: SARS-CoV-2 Spike Protein
Month-Year | Retrieval dataset (r) | Deduplicated dataset (% of r) | Minimal dataset (% of r) |
---|---|---|---|
July 2021 | 2,115,156 | 358,096 (~16.9) | 42,399 (~2.0) |
December 2022 | 14,060,695 | 2,778,826 (~19.8) | 112,912 (~0.8) |
Note: All data were retrieved as of December 2022.
- For original reference, please refer to our paper:
Chong, L.C.; Lim, W.L.; Ban, K.H.K.; Khan, A.M. An Alignment-Iindependent Approach for the Study of Viral Sequence Diversity at Any Given Rank of Taxonomy Lineage. Biology 2021, 10, 853. doi: 10.3390/biology10090853 - For a protocol describing the step-by-step utility of UNIQmin, please refer to our preprint:
Chong, L.C.; Khan, A.M. UNIQmin, An Alignment-free Tool to Study Viral Sequence Diversity across Taxonomic Lineages: A Case Study of Monkeypox Virus. bioRxiv 2022.08.09.503271. doi: 10.1101/2022.08.09.503271 - For application paper to all viruses,
- For application of UNIQmin to study SARS-CoV-2 lineage diversity, please refer to our preprint:
Chong, L.C.; Khan, A.M. Negligible Peptidome Diversity of SARS-CoV-2 and its Higher Taxonomic Ranks. bioRxiv 2022.10.31.513750. doi: 10.1101/2022.10.31.513750
Or would like to drop some feedback?
Just open a new issue or send an email to us ([email protected]).