From ec69ae6fc878249690a9efa882e8813415621d3d Mon Sep 17 00:00:00 2001 From: "Robert A. Petit III" Date: Tue, 23 Jul 2024 00:20:12 +0000 Subject: [PATCH] add infor about blast-region to readme --- README.md | 155 +++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 130 insertions(+), 25 deletions(-) diff --git a/README.md b/README.md index c0cc6b2..9bc660f 100644 --- a/README.md +++ b/README.md @@ -247,8 +247,8 @@ with BLAST algorithms. #### Example {PREFIX}.tsv ```tsv -sample type targets schema version comment -saureus V ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1 sccmec 1.0.0 +sample type targets schema schema_version camlhmp_version params comment +camlhmp I ccrA1,ccrB1,IS431,IS1272,mecA,mecR1 sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 ``` | Column | Description | @@ -257,18 +257,23 @@ saureus V ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1 sccmec 1.0.0 | type | The predicted type | | targets | The targets for the given type that had a hit | | schema | The schema used to determine the type | -| version | The version of the schema used | +| schema_version | The version of the schema used | +| camlhmp_version | The version of camlhmp used | +| params | The parameters used for the analysis | | comment | A small comment about the result | #### Example {PREFIX}.blast.tsv ```tsv qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore -ccrC1 AB121219.1 100.000 100 1623 28612 1623 1623 0 0 1 1623 16132 17754 0.0 2998 -IS431_1 AB121219.1 100.000 100 791 28612 791 791 0 0 1 791 8221 9011 0.0 1461 -IS431_1 AB121219.1 99.704 100 675 28612 675 673 2 0 1 675 2693 3367 0.0 1236 -IS431_1 AB121219.1 98.519 100 675 28612 675 665 10 0 1 675 8951 8277 0.0 1192 -... +ccrA1 AB033763.2 100.000 100 1350 39332 1350 1350 0 0 1 1350 23692 25041 0.0 2494 +ccrB1 AB033763.2 100.000 100 1152 39332 1152 1152 0 0 1 1152 25063 26214 0.0 2128 +IS1272 AB033763.2 100.000 100 1659 39332 1659 1659 0 0 1 1659 28423 30081 0.0 3064 +mecR1 AB033763.2 100.000 100 987 39332 987 987 0 0 1 987 30304 31290 0.0 1823 +mecA AB033763.2 99.950 100 2007 39332 2007 2006 1 0 1 2007 31390 33396 0.0 3701 +mecA AB033763.2 99.950 100 2007 39332 2007 2006 1 0 1 2007 31390 33396 0.0 3701 +IS431 AB033763.2 99.873 100 790 39332 790 789 1 0 1 790 35958 36747 0.0 1454 +IS431 AB033763.2 100.000 100 792 39332 792 792 0 0 1 792 35957 36748 0.0 1463 ``` This is the standard BLAST output with `-outfmt 6` @@ -276,22 +281,11 @@ This is the standard BLAST output with `-outfmt 6` #### Example {PREFIX}.details.tsv ```tsv -sample type status targets missing schema version comment -type-v I False IS431,mecA,mecR1 ccrA1,ccrB1,IS1272 sccmec 1.0.0 -type-v II False IS431,mecA,mecR1 ccrA2,ccrB2,mecI sccmec 1.0.0 -type-v III False IS431,mecA,mecR1 ccrA3,ccrB3,mecI sccmec 1.0.0 -type-v IV False IS431,mecA,mecR1 ccrA2,ccrB2,IS1272 sccmec 1.0.0 -type-v V True ccrC1,IS431_1,mecA,mecR1,IS431_2 sccmec 1.0.0 -type-v VI False IS431,mecA,mecR1 ccrA4,ccrB4,IS1272 sccmec 1.0.0 -type-v VII False ccrC1,IS431_1,mecA,mecR1,IS431_2 IS12960D sccmec 1.0.0 -type-v VIII False IS431,mecA,mecR1 ccrA4,ccrB4,mecI sccmec 1.0.0 Excluded target ccrC1 found, failing type VIII -type-v IX False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB1 sccmec 1.0.0 -type-v X False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB6 sccmec 1.0.0 -type-v XI False mecA,mecR1 ccrA1,ccrB3,blaZ,mecI sccmec 1.0.0 -type-v XII False IS431_1,mecA,mecR1,IS431_2 ccrC2 sccmec 1.0.0 -type-v XIII False IS431,mecA,mecR1 ccrC2,mecI sccmec 1.0.0 -type-v XIV False ccrC1,IS431,mecA,mecR1 mecI sccmec 1.0.0 -type-v XV False IS431,mecA,mecR1 ccrA1,ccrB6,mecI sccmec 1.0.0 +sample type status targets missing schema schema_version camlhmp_version params comment +camlhmp I True ccrA1,ccrB1,IS431,mecA,mecR1,IS1272 sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 +camlhmp II False IS431,mecA,mecR1 ccrA2,ccrB2,mecI sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 +camlhmp III False IS431,mecA,mecR1 ccrA3,ccrB3,mecI sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 +camlhmp IV False IS431,mecA,mecR1,IS1272 ccrA2,ccrB2 sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 ``` This file provides a detailed view of the results. The columns are: @@ -304,7 +298,118 @@ This file provides a detailed view of the results. The columns are: | targets | The targets for the given type that had a match | | missing | The targets for the given type that were not found | | schema | The schema used to determine the type | -| version | The version of the schema used | +| schema_version | The version of the schema used | +| camlhmp_version | The version of camlhmp used | +| params | The parameters used for the analysis | +| comment | A small comment about the result | + +## `camlhmp-blast-region` + +`camlhmp-blast-region` is a command that allows users to search for full regions of interest. +It is nearly identical to `camlhmp-blast`, but instead of many smaller targets the idea is to +instead look at full regions such as O-antigens and or similar features. + +### Usage + +```bash + Usage: camlhmp-blast-region [OPTIONS] + + 🐪 camlhmp-blast-region 🐪 - Classify assemblies with a camlhmp schema using BLAST against + larger genomic regions + +╭─ Options ───────────────────────────────────────────────────────────────────────────────────╮ +│ * --input -i TEXT Input file in FASTA format to classify [required] │ +│ * --yaml -y TEXT YAML file documenting the targets and types [required] │ +│ * --targets -t TEXT Query targets in FASTA format [required] │ +│ --outdir -o PATH Directory to write output [default: ./] │ +│ --prefix -p TEXT Prefix to use for output files [default: camlhmp] │ +│ --min-pident INTEGER Minimum percent identity to count a hit [default: 95] │ +│ --min-coverage INTEGER Minimum percent coverage to count a hit [default: 95] │ +│ --force Overwrite existing reports │ +│ --verbose Increase the verbosity of output │ +│ --silent Only critical errors will be printed │ +│ --version -V Print schema and camlhmp version │ +│ --help Show this message and exit. │ +╰─────────────────────────────────────────────────────────────────────────────────────────────╯ +``` + +### Output Files + +`camlhmp-blast-region` will generate three output files: + +| File Name | Description | +|------------------------|-------------------------------------------------| +| `{PREFIX}.tsv` | A tab-delimited file with the predicted type | +| `{PREFIX}.blast.tsv` | A tab-delimited file of all blast hits | +| `{PREFIX}.details.tsv` | A tab-delimited file with details for each type | + +#### Example {PREFIX}.tsv + +```tsv +sample type targets coverage hits schema schema_version camlhmp_version params comment +camlhmp O5 O2 100.00 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 + +``` + +| Column | Description | +|---------|--------------------------------------------------| +| sample | The sample name as determined by `--prefix` | +| type | The predicted type | +| targets | The targets for the given type that had a hit | +| coverage | The coverage of the target region | +| hits | The number of hits used to calculate coverage of the target region | +| schema | The schema used to determine the type | +| schema_version | The version of the schema used | +| camlhmp_version | The version of camlhmp used | +| params | The parameters used for the analysis | +| comment | A small comment about the result | + +#### Example {PREFIX}.blast.tsv + +```tsv +qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore +wzyB NZ_PSQS01000003.1 88.403 99 1140 6935329 595 526 69 0 545 1139 6874509 6875103 0.0 717 +wzyB NZ_PSQS01000003.1 88.403 99 1140 6935329 595 526 69 0 545 1139 6920911 6921505 0.0 717 +wzyB NZ_PSQS01000003.1 89.444 99 1140 6935329 540 483 56 1 1 539 6872864 6873403 0.0 680 +wzyB NZ_PSQS01000003.1 89.444 99 1140 6935329 540 483 56 1 1 539 6919266 6919805 0.0 680 +O1 NZ_PSQS01000003.1 97.972 12 18368 6935329 1972 1932 38 2 16398 18368 6620589 6618619 0.0 3419 +O1 NZ_PSQS01000003.1 96.296 12 18368 6935329 324 312 11 1 1 323 6641914 6641591 1.68e-149 531 +O2 NZ_PSQS01000003.1 99.841 100 23303 6935329 23303 23266 30 1 1 23303 6618619 6641914 0.0 42821 +O2 NZ_PSQS01000003.1 86.935 100 23303 6935329 1240 1078 130 12 2542 3749 3864567 3863328 0.0 1363 +O3 NZ_PSQS01000003.1 94.442 13 20210 6935329 2393 2260 114 15 1 2386 6618619 6620999 0.0 3664 +O3 NZ_PSQS01000003.1 99.308 13 20210 6935329 289 287 2 0 19922 20210 6641626 6641914 3.09e-147 523 +O4 NZ_PSQS01000003.1 97.448 14 15279 6935329 1842 1795 47 0 1 1842 6618619 6620460 0.0 3142 +O4 NZ_PSQS01000003.1 99.638 14 15279 6935329 276 275 1 0 15004 15279 6641639 6641914 8.46e-142 505 +``` + +This is the standard BLAST output with `-outfmt 6` + +#### Example {PREFIX}.details.tsv + +```tsv +sample type status targets missing coverage hits schema schema_version camlhmp_version params comment +camlhmp O1 False O1 12.49 2 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 Coverage based on 2 hits +camlhmp O2 False O2 wzyB 100.00,0.00 1,0 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 +camlhmp O3 False O3 1.43 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 +camlhmp O4 False O4 13.86 2 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 Coverage based on 2 hits +camlhmp O5 True O2 100.00 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 +``` + +This file provides a detailed view of the results. The columns are: + +| Column | Description | +|---------|----------------------------------------------------| +| sample | The sample name as determined by `--prefix` | +| type | The predicted type | +| status | The status of the type (True if failed) | +| targets | The targets for the given type that had a match | +| missing | The targets for the given type that were not found | +| coverage | The coverage of the target region | +| hits | The number of hits used to calculate coverage of the target region | +| schema | The schema used to determine the type | +| schema_version | The version of the schema used | +| camlhmp_version | The version of camlhmp used | +| params | The parameters used for the analysis | | comment | A small comment about the result | ## `camlhmp-extract`