Skip to content

Latest commit

 

History

History
57 lines (42 loc) · 2.29 KB

README.md

File metadata and controls

57 lines (42 loc) · 2.29 KB

pyCOMET

Python implementation of the COMET subtyping tool. The original implementation of COMET algorithm can be found in Struck et al. paper here.

Training

Use pyCOMET_train.py to build the PPMD models for subtyping from a refernce sequence file in FASTA format. The usage is given below:

usage: pyCOMET_train.py [-h] -r REFERENCE -c CONTEXT -m MODELFILE

Creates PPMD models from a set of reference sequences given a fixed context size for subtype classification

optional arguments:
  -h, --help            show this help message and exit
  -r REFERENCE, --reference REFERENCE
                        Reference sequence file in FASTA format for creating PPMD models
  -c CONTEXT, --context CONTEXT
                        Context size for PPMD models (default: 8)
  -m MODELFILE, --modelFile MODELFILE
                        Output file to save the PPMD model

The sequence headers of the reference FASTA file must have the following format:

>subtype.rest_of_the_header

The PPMD models are written in umsgpack format using the python package found here.

Subtyping

Query sequences in a FASTA file can be used for subtyping using pyCOMET_subtype.py. This script requires a PPMD model description file in umsgpack format generated by pyCOMET_train.py. The usage line is given below:

usage: pyCOMET_subtype.py [-h] -q QUERY [-c CONTEXT] [-w WSIZE] [-b BSIZE] -m
                          MODELFILE -o OUTFILE

Predicts subtype of a sequence based on PPMD models trained using reference sequences

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        Query sequence file in FASTA format for subtype classification
  -c CONTEXT, --context CONTEXT
                        Context size for PPMD model (default: 8)
  -w WSIZE, --wSize WSIZE
                        Window size for COMET decision tree (default: 100)
  -b BSIZE, --bSize BSIZE
                        Step size of the windows for COMET decision tree (default: 3)
  -m MODELFILE, --modelFile MODELFILE
                        PPMD model file for reference sequences
  -o OUTFILE, --outFile OUTFILE
                        Output file for subtype prediction results