Python implementation of the COMET subtyping tool. The original implementation of COMET algorithm can be found in Struck et al. paper here.
Use pyCOMET_train.py
to build the PPMD models for subtyping from a refernce sequence file in FASTA format. The usage is given below:
usage: pyCOMET_train.py [-h] -r REFERENCE -c CONTEXT -m MODELFILE
Creates PPMD models from a set of reference sequences given a fixed context size for subtype classification
optional arguments:
-h, --help show this help message and exit
-r REFERENCE, --reference REFERENCE
Reference sequence file in FASTA format for creating PPMD models
-c CONTEXT, --context CONTEXT
Context size for PPMD models (default: 8)
-m MODELFILE, --modelFile MODELFILE
Output file to save the PPMD model
The sequence headers of the reference FASTA file must have the following format:
>subtype.rest_of_the_header
The PPMD models are written in umsgpack format using the python package found here.
Query sequences in a FASTA file can be used for subtyping using pyCOMET_subtype.py
. This script requires a PPMD model description file in umsgpack format generated by pyCOMET_train.py
. The usage line is given below:
usage: pyCOMET_subtype.py [-h] -q QUERY [-c CONTEXT] [-w WSIZE] [-b BSIZE] -m
MODELFILE -o OUTFILE
Predicts subtype of a sequence based on PPMD models trained using reference sequences
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
Query sequence file in FASTA format for subtype classification
-c CONTEXT, --context CONTEXT
Context size for PPMD model (default: 8)
-w WSIZE, --wSize WSIZE
Window size for COMET decision tree (default: 100)
-b BSIZE, --bSize BSIZE
Step size of the windows for COMET decision tree (default: 3)
-m MODELFILE, --modelFile MODELFILE
PPMD model file for reference sequences
-o OUTFILE, --outFile OUTFILE
Output file for subtype prediction results