Skip to content

Commit

Permalink
add infor about blast-region to readme
Browse files Browse the repository at this point in the history
  • Loading branch information
rpetit3 committed Jul 23, 2024
1 parent 1fad76a commit ec69ae6
Showing 1 changed file with 130 additions and 25 deletions.
155 changes: 130 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,8 +247,8 @@ with BLAST algorithms.
#### Example {PREFIX}.tsv

```tsv
sample type targets schema version comment
saureus V ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1 sccmec 1.0.0
sample type targets schema schema_version camlhmp_version params comment
camlhmp I ccrA1,ccrB1,IS431,IS1272,mecA,mecR1 sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
```

| Column | Description |
Expand All @@ -257,41 +257,35 @@ saureus V ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1 sccmec 1.0.0
| type | The predicted type |
| targets | The targets for the given type that had a hit |
| schema | The schema used to determine the type |
| version | The version of the schema used |
| schema_version | The version of the schema used |
| camlhmp_version | The version of camlhmp used |
| params | The parameters used for the analysis |
| comment | A small comment about the result |

#### Example {PREFIX}.blast.tsv

```tsv
qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore
ccrC1 AB121219.1 100.000 100 1623 28612 1623 1623 0 0 1 1623 16132 17754 0.0 2998
IS431_1 AB121219.1 100.000 100 791 28612 791 791 0 0 1 791 8221 9011 0.0 1461
IS431_1 AB121219.1 99.704 100 675 28612 675 673 2 0 1 675 2693 3367 0.0 1236
IS431_1 AB121219.1 98.519 100 675 28612 675 665 10 0 1 675 8951 8277 0.0 1192
...
ccrA1 AB033763.2 100.000 100 1350 39332 1350 1350 0 0 1 1350 23692 25041 0.0 2494
ccrB1 AB033763.2 100.000 100 1152 39332 1152 1152 0 0 1 1152 25063 26214 0.0 2128
IS1272 AB033763.2 100.000 100 1659 39332 1659 1659 0 0 1 1659 28423 30081 0.0 3064
mecR1 AB033763.2 100.000 100 987 39332 987 987 0 0 1 987 30304 31290 0.0 1823
mecA AB033763.2 99.950 100 2007 39332 2007 2006 1 0 1 2007 31390 33396 0.0 3701
mecA AB033763.2 99.950 100 2007 39332 2007 2006 1 0 1 2007 31390 33396 0.0 3701
IS431 AB033763.2 99.873 100 790 39332 790 789 1 0 1 790 35958 36747 0.0 1454
IS431 AB033763.2 100.000 100 792 39332 792 792 0 0 1 792 35957 36748 0.0 1463
```

This is the standard BLAST output with `-outfmt 6`

#### Example {PREFIX}.details.tsv

```tsv
sample type status targets missing schema version comment
type-v I False IS431,mecA,mecR1 ccrA1,ccrB1,IS1272 sccmec 1.0.0
type-v II False IS431,mecA,mecR1 ccrA2,ccrB2,mecI sccmec 1.0.0
type-v III False IS431,mecA,mecR1 ccrA3,ccrB3,mecI sccmec 1.0.0
type-v IV False IS431,mecA,mecR1 ccrA2,ccrB2,IS1272 sccmec 1.0.0
type-v V True ccrC1,IS431_1,mecA,mecR1,IS431_2 sccmec 1.0.0
type-v VI False IS431,mecA,mecR1 ccrA4,ccrB4,IS1272 sccmec 1.0.0
type-v VII False ccrC1,IS431_1,mecA,mecR1,IS431_2 IS12960D sccmec 1.0.0
type-v VIII False IS431,mecA,mecR1 ccrA4,ccrB4,mecI sccmec 1.0.0 Excluded target ccrC1 found, failing type VIII
type-v IX False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB1 sccmec 1.0.0
type-v X False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB6 sccmec 1.0.0
type-v XI False mecA,mecR1 ccrA1,ccrB3,blaZ,mecI sccmec 1.0.0
type-v XII False IS431_1,mecA,mecR1,IS431_2 ccrC2 sccmec 1.0.0
type-v XIII False IS431,mecA,mecR1 ccrC2,mecI sccmec 1.0.0
type-v XIV False ccrC1,IS431,mecA,mecR1 mecI sccmec 1.0.0
type-v XV False IS431,mecA,mecR1 ccrA1,ccrB6,mecI sccmec 1.0.0
sample type status targets missing schema schema_version camlhmp_version params comment
camlhmp I True ccrA1,ccrB1,IS431,mecA,mecR1,IS1272 sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp II False IS431,mecA,mecR1 ccrA2,ccrB2,mecI sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp III False IS431,mecA,mecR1 ccrA3,ccrB3,mecI sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp IV False IS431,mecA,mecR1,IS1272 ccrA2,ccrB2 sccmec_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
```

This file provides a detailed view of the results. The columns are:
Expand All @@ -304,7 +298,118 @@ This file provides a detailed view of the results. The columns are:
| targets | The targets for the given type that had a match |
| missing | The targets for the given type that were not found |
| schema | The schema used to determine the type |
| version | The version of the schema used |
| schema_version | The version of the schema used |
| camlhmp_version | The version of camlhmp used |
| params | The parameters used for the analysis |
| comment | A small comment about the result |

## `camlhmp-blast-region`

`camlhmp-blast-region` is a command that allows users to search for full regions of interest.
It is nearly identical to `camlhmp-blast`, but instead of many smaller targets the idea is to
instead look at full regions such as O-antigens and or similar features.

### Usage

```bash
Usage: camlhmp-blast-region [OPTIONS]
🐪 camlhmp-blast-region 🐪 - Classify assemblies with a camlhmp schema using BLAST against
larger genomic regions
╭─ Options ───────────────────────────────────────────────────────────────────────────────────╮
│ * --input -i TEXT Input file in FASTA format to classify [required] │
│ * --yaml -y TEXT YAML file documenting the targets and types [required] │
│ * --targets -t TEXT Query targets in FASTA format [required] │
│ --outdir -o PATH Directory to write output [default: ./] │
│ --prefix -p TEXT Prefix to use for output files [default: camlhmp] │
│ --min-pident INTEGER Minimum percent identity to count a hit [default: 95] │
│ --min-coverage INTEGER Minimum percent coverage to count a hit [default: 95] │
│ --force Overwrite existing reports │
│ --verbose Increase the verbosity of output │
│ --silent Only critical errors will be printed │
│ --version -V Print schema and camlhmp version │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────╯
```

### Output Files

`camlhmp-blast-region` will generate three output files:

| File Name | Description |
|------------------------|-------------------------------------------------|
| `{PREFIX}.tsv` | A tab-delimited file with the predicted type |
| `{PREFIX}.blast.tsv` | A tab-delimited file of all blast hits |
| `{PREFIX}.details.tsv` | A tab-delimited file with details for each type |

#### Example {PREFIX}.tsv

```tsv
sample type targets coverage hits schema schema_version camlhmp_version params comment
camlhmp O5 O2 100.00 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
```

| Column | Description |
|---------|--------------------------------------------------|
| sample | The sample name as determined by `--prefix` |
| type | The predicted type |
| targets | The targets for the given type that had a hit |
| coverage | The coverage of the target region |
| hits | The number of hits used to calculate coverage of the target region |
| schema | The schema used to determine the type |
| schema_version | The version of the schema used |
| camlhmp_version | The version of camlhmp used |
| params | The parameters used for the analysis |
| comment | A small comment about the result |

#### Example {PREFIX}.blast.tsv

```tsv
qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore
wzyB NZ_PSQS01000003.1 88.403 99 1140 6935329 595 526 69 0 545 1139 6874509 6875103 0.0 717
wzyB NZ_PSQS01000003.1 88.403 99 1140 6935329 595 526 69 0 545 1139 6920911 6921505 0.0 717
wzyB NZ_PSQS01000003.1 89.444 99 1140 6935329 540 483 56 1 1 539 6872864 6873403 0.0 680
wzyB NZ_PSQS01000003.1 89.444 99 1140 6935329 540 483 56 1 1 539 6919266 6919805 0.0 680
O1 NZ_PSQS01000003.1 97.972 12 18368 6935329 1972 1932 38 2 16398 18368 6620589 6618619 0.0 3419
O1 NZ_PSQS01000003.1 96.296 12 18368 6935329 324 312 11 1 1 323 6641914 6641591 1.68e-149 531
O2 NZ_PSQS01000003.1 99.841 100 23303 6935329 23303 23266 30 1 1 23303 6618619 6641914 0.0 42821
O2 NZ_PSQS01000003.1 86.935 100 23303 6935329 1240 1078 130 12 2542 3749 3864567 3863328 0.0 1363
O3 NZ_PSQS01000003.1 94.442 13 20210 6935329 2393 2260 114 15 1 2386 6618619 6620999 0.0 3664
O3 NZ_PSQS01000003.1 99.308 13 20210 6935329 289 287 2 0 19922 20210 6641626 6641914 3.09e-147 523
O4 NZ_PSQS01000003.1 97.448 14 15279 6935329 1842 1795 47 0 1 1842 6618619 6620460 0.0 3142
O4 NZ_PSQS01000003.1 99.638 14 15279 6935329 276 275 1 0 15004 15279 6641639 6641914 8.46e-142 505
```

This is the standard BLAST output with `-outfmt 6`

#### Example {PREFIX}.details.tsv

```tsv
sample type status targets missing coverage hits schema schema_version camlhmp_version params comment
camlhmp O1 False O1 12.49 2 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 Coverage based on 2 hits
camlhmp O2 False O2 wzyB 100.00,0.00 1,0 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp O3 False O3 1.43 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp O4 False O4 13.86 2 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 Coverage based on 2 hits
camlhmp O5 True O2 100.00 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
```

This file provides a detailed view of the results. The columns are:

| Column | Description |
|---------|----------------------------------------------------|
| sample | The sample name as determined by `--prefix` |
| type | The predicted type |
| status | The status of the type (True if failed) |
| targets | The targets for the given type that had a match |
| missing | The targets for the given type that were not found |
| coverage | The coverage of the target region |
| hits | The number of hits used to calculate coverage of the target region |
| schema | The schema used to determine the type |
| schema_version | The version of the schema used |
| camlhmp_version | The version of camlhmp used |
| params | The parameters used for the analysis |
| comment | A small comment about the result |

## `camlhmp-extract`
Expand Down

0 comments on commit ec69ae6

Please sign in to comment.