-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from PacificBiosciences/read_calling_update
update README for v0.3.0 release
- Loading branch information
Showing
10 changed files
with
906 additions
and
470 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# v0.3.0: 03/21/24 | ||
## Changes | ||
- New tool `call-reads` to call from HiFi reads (limited to class I) | ||
- Extend reports for read-based calls | ||
- Update IPD-IMGT/HLA to version 3.55 (2024-01) | ||
- Add option to limit calls to genomic (full-length) records | ||
- Update README | ||
|
||
# v0.2.3: 12/22/23 | ||
## Changes | ||
- Fix reporting bug for cdna calls | ||
- Add option to require exon 2 to make a call | ||
- Improved thread control and error reporting | ||
|
||
# v0.2.2: 11/17/23 | ||
## Changes | ||
- Update database to IPD-IMGT/HLA Version: 3.54 (2023-10) | ||
|
||
# v0.2.1: 11/15/23 | ||
## Changes | ||
- Call-contigs hap2 change to optional | ||
- Improve output determinism | ||
- Update README | ||
- Fix validation bugs | ||
|
||
# v0.2.0: 9/25/23 | ||
## Changes | ||
- Update README for initial release | ||
- Latest database version v3.53 | ||
- Bug fixes | ||
|
||
# v0.1.0: 5/31/23 | ||
## Changes | ||
- Report results in tsv and json format |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
## Genes <a name="genes"></a> | ||
|
||
Genes called by `hifihla`: | ||
``` | ||
HLA-A HLA-DQA1 HLA-DRB7 HLA-P | ||
HLA-B HLA-DQA2 HLA-DRB8 HLA-S | ||
HLA-C HLA-DQB1 HLA-E HLA-T | ||
HLA-DMA HLA-DQB2 HLA-F HLA-U | ||
HLA-DMB HLA-DRA HLA-G HLA-V | ||
HLA-DOA HLA-DRB1 HLA-H HLA-W | ||
HLA-DOB HLA-DRB2 HLA-HFE HLA-Y | ||
HLA-DPA1 HLA-DRB3 HLA-J MICA | ||
HLA-DPA2 HLA-DRB4 HLA-K MICB | ||
HLA-DPB1 HLA-DRB5 HLA-L TAP1 | ||
HLA-DPB2 HLA-DRB6 HLA-N | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
## Install <a name="install"></a> | ||
|
||
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/hifihla/README.html) | ||
|
||
Please refer to our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda) for information on Installation, Support, License, Copyright, and Disclaimer. | ||
|
||
`hifihla` can be installed from bioconda | ||
``` | ||
conda install -c bioconda hifihla | ||
``` | ||
Binaries are also availible in the github [releases](https://github.com/PacificBiosciences/HiFiHLA/releases/latest). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
## Output <a name="output"></a> | ||
`call-reads`, `call-consensus` and `call-contigs` all generate three reports containing HLA star-allele type calls. Additionally, `call-contigs` produces fasta files of extracted sequences from the assembly. | ||
|
||
| File | Description | | ||
| -------------------------------------------- | ----------- | | ||
| {output_dir}/hifihla_summary.tsv | Detailed file listing best call for each locus | | ||
| {output_dir}/hifihla_report.tsv | Simple tsv file listing calls for each locus | | ||
| {output_dir}/hifihla_report.json | Detailed results file, see below for example | | ||
| {output_dir}/asm.contigs.h[12].fasta | Extracted (full) assembly contigs aligning to MHC | | ||
| {output_dir}/asm.contigs.h[12].fasta.fai | FASTA index for contigs | | ||
| {output_dir}/extracted.targets.h[12].fasta | Extracted targets used for star-typing | | ||
|
||
### Detailed summary tsv | ||
This file reports the single best call and statistics for each query sequence in the sample. | ||
|
||
| Column Name | Description | | ||
|--------------|------------------------------------------------------| | ||
| queryId | Query sequence name | | ||
| qLen | Length of query sequence | | ||
| nMatches | Number of equivalent matches with same edit distance | | ||
| gType | Closest matching genomic accession | | ||
| gPctId | Percent matching bases of closest genomic match | | ||
| gPctCov | Percent coverage of closest genomic match | | ||
| gEdit | Edit distance to closest genomic match | | ||
| cdnaType | Closest matching coding sequence | | ||
| exCovered | Exons included in alignment | | ||
| exEdit | Edit distance to cDNA match | | ||
| coverage | HiFi read count of allele (call-reads only) | | ||
| errRate | Total error rate of HiFi reads (call-reads only) | | ||
| Type | Final type call | | ||
|
||
Final type call is determined by edit distance and exact matches to the query (nMatches). The final call is four field if the query sequences matches exactly and is the only exact match returned. | ||
|
||
`call-reads` summary for HG002 (Class I): | ||
``` | ||
queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type | ||
HLA-A_1 3503 1 HLA-A*01:01:01:01 100.0 100.0 0 HLA-A*01:01:01 1,2,3,4,5,6,7,8 0 15 0.00276 HLA-A*01:01:01:01 | ||
HLA-A_0 3517 1 HLA-A*26:01:01:01 100.0 100.0 0 HLA-A*26:01:01 1,2,3,4,5,6,7,8 0 13 0.00249 HLA-A*26:01:01:01 | ||
HLA-B_0 3855 1 HLA-B*35:08:01:01 100.0 100.0 0 HLA-B*35:08:01 1,2,3,4,5,6,7 0 18 0.00231 HLA-B*35:08:01:01 | ||
HLA-B_1 4070 1 HLA-B*38:01:01:01 100.0 100.0 0 HLA-B*38:01:01 1,2,3,4,5,6,7 0 10 0.00204 HLA-B*38:01:01:01 | ||
HLA-C_0 4264 1 HLA-C*04:01:01:06 100.0 100.0 0 HLA-C*04:01:01 1,2,3,4,5,6,7,8 0 12 0.00135 HLA-C*04:01:01:06 | ||
HLA-C_1 4303 1 HLA-C*12:03:01:01 100.0 100.0 0 HLA-C*12:03:01 1,2,3,4,5,6,7,8 0 8 0.00215 HLA-C*12:03:01:01 | ||
``` | ||
`call-contigs` summary for HG00733: | ||
``` | ||
queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type | ||
HG00733#1#h1tg000070l_29911131_29915604 4474 1 HLA-A*24:02:01:01 100.0 100.0 0 HLA-A*24:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*24:02:01:01 | ||
HG00733#2#h2tg000008l_1854867_1859341 4475 1 HLA-A*30:02:01:01 100.0 100.0 0 HLA-A*30:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*30:02:01:01 | ||
HG00733#1#h1tg000070l_31324234_31329311 5078 1 HLA-B*35:02:01:02 100.0 100.0 0 HLA-B*35:02:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*35:02:01:02 | ||
HG00733#2#h2tg000008l_440231_445304 5074 1 HLA-B*18:01:01:01 100.0 100.0 0 HLA-B*18:01:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*18:01:01:01 | ||
HG00733#1#h1tg000070l_31239869_31245142 5274 1 HLA-C*04:01:01:06 100.0 100.0 0 HLA-C*04:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*04:01:01:06 | ||
HG00733#2#h2tg000008l_525397_530669 5273 1 HLA-C*05:01:01:01 100.0 100.0 0 HLA-C*05:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*05:01:01:01 | ||
HG00733#1#h1tg000070l_33003286_33014048 10763 1 HLA-DPA1*01:03:01:02 100.0 100.0 0 HLA-DPA1*01:03:01 1,2,3,4 0 1 N/A HLA-DPA1*01:03:01:02 | ||
HG00733#2#h2tg000060l_26807522_26818289 10768 1 HLA-DPA1*02:01:01:01 100.0 100.0 0 HLA-DPA1*02:01:01 1,2,3,4 0 1 N/A HLA-DPA1*02:01:01:01 | ||
HG00733#1#h1tg000070l_33014638_33027155 12518 1 HLA-DPB1*04:01:01:01 100.0 100.0 0 HLA-DPB1*04:01:01 1,2,3,4,5 0 1 N/A HLA-DPB1*04:01:01:01 | ||
HG00733#2#h2tg000060l_26794480_26806932 12453 1 HLA-DPB1*11:01:01:01 100.0 100.0 0 HLA-DPB1*11:01:01 1,2,3,4,5 0 1 N/A HLA-DPB1*11:01:01:01 | ||
HG00733#1#h1tg000070l_32583472_32591044 7573 1 HLA-DQA1*05:05:01:01 100.0 100.0 0 HLA-DQA1*05:05:01 1,2,3,4 0 1 N/A HLA-DQA1*05:05:01:01 | ||
HG00733#2#h2tg000060l_27233443_27240904 7462 1 HLA-DQA1*05:01:01:01 100.0 100.0 0 HLA-DQA1*05:01:01 1,2,3,4 0 1 N/A HLA-DQA1*05:01:01:01 | ||
HG00733#1#h1tg000070l_32600790_32608991 8202 1 HLA-DQB1*03:01:01:02 100.0 100.0 0 HLA-DQB1*03:01:01 1,2,3,4,5,6 0 1 N/A HLA-DQB1*03:01:01:02 | ||
HG00733#2#h2tg000060l_27214264_27222723 8460 1 HLA-DQB1*02:01:01:01 100.0 100.0 0 HLA-DQB1*02:01:01 1,2,3,4,5,6 0 1 N/A HLA-DQB1*02:01:01:01 | ||
HG00733#1#h1tg000070l_32526722_32541633 14912 1 HLA-DRB1*11:04:01:01 100.0 100.0 0 HLA-DRB1*11:04:01 1,2,3,4,5,6 0 1 N/A HLA-DRB1*11:04:01:01 | ||
HG00733#2#h2tg000060l_27282742_27297652 14911 1 HLA-DRB1*03:01:01:02 100.0 100.0 0 HLA-DRB1*03:01:01 1,2,3,4,5,6 0 1 N/A HLA-DRB1*03:01:01:02 | ||
HG00733#1#h1tg000070l_32462971_32477576 14606 1 HLA-DRB3*02:02:01:04 99.99 100.0 1 HLA-DRB3*02:02:01 1,2,3,4,5,6 0 1 N/A HLA-DRB3*02:02:01 | ||
HG00733#2#h2tg000060l_27346699_27361303 14605 1 HLA-DRB3*02:02:01:01 100.0 100.0 0 HLA-DRB3*02:02:01 1,2,3,4,5,6 0 1 N/A HLA-DRB3*02:02:01:01 | ||
``` | ||
|
||
### Simple tsv format | ||
This file is formatted for a quick view of top-level results. It is appropriate as input for downstream interpretation tools like [pharmCAT](https://pharmcat.org/). | ||
|
||
``` | ||
$ column -t hifihla_report.tsv | ||
HLA-A *01:01:01:01/*26:01:01:01 | ||
HLA-B *38:01:01:01/*35:08:01:01 | ||
HLA-C *12:03:01:01/*04:01:01:06 | ||
``` | ||
|
||
### Detailed results json | ||
Detailed results contain information on calls for each sequence typed. The json is structured in the following way: | ||
|
||
``` | ||
{ sample_id: <sample_id>, | ||
hla_calls: { | ||
<locus>: { | ||
<Record_id>: [ | ||
{ | ||
allele_id: <imgt_accession>, | ||
star_name: <full_star_name>, | ||
length: <accession_seq_length>, | ||
match_name: <match_up_to_fields>, | ||
query_start: <query_aln_start>, | ||
query_end: <query_aln_end>, | ||
covered_feat: <aligned_features>, | ||
not_covered: <features_no_coverage>, | ||
coding_diffs: <total_exon_differences>, | ||
noncode_eddist: <edit_dist_introns_utr>, | ||
error_rate: <total HiFi error rate (call-reads only)>, | ||
coverage: <Number of reads covering allele (call-reads only)>, | ||
reads: <HiFi read names used in analysis of allele (call-reads only)>, | ||
differences: [ | ||
{ | ||
kind: <mismatch/insertion/deletion>, | ||
pos: <position>, | ||
size: <variant_length>, | ||
feat: <feature_containing_variant> | ||
}, | ||
... | ||
] | ||
} | ||
], | ||
... | ||
} | ||
... | ||
} | ||
} | ||
``` | ||
|
||
Extracting detailed results for a given locus can be done with the [jq](https://jqlang.github.io/jq/) tool: | ||
``` | ||
jq '.hla_calls | with_entries(select(.key == "HLA-DRB3"))' tmp2/hifihla_report.json | ||
{ | ||
"HLA-DRB3": [ | ||
{ | ||
"HG00733#1#h1tg000070l_32462971_32477576": [ | ||
{ | ||
"allele_id": "HLA22697", | ||
"star_name": "HLA-DRB3*02:02:01:04", | ||
"length": 12905, | ||
"match_name": "*02:02:01", | ||
"query_start": 0, | ||
"query_end": 12905, | ||
"covered_feat": [ | ||
"UTR_5", | ||
"Exon_1", | ||
"Intron_1", | ||
"Exon_2", | ||
"Intron_2", | ||
"Exon_3", | ||
"Intron_3", | ||
"Exon_4", | ||
"Intron_4", | ||
"Exon_5", | ||
"Intron_5", | ||
"Exon_6", | ||
"UTR_3" | ||
], | ||
"not_covered": [], | ||
"coding_diffs": 0, | ||
"noncode_eddist": 1, | ||
"error_rate": null, | ||
"coverage": 1, | ||
"reads": null, | ||
"differences": [ | ||
{ | ||
"kind": "Mismatch", | ||
"pos": 8158, | ||
"size": 1, | ||
"feat": "Intron_2" | ||
} | ||
] | ||
} | ||
], | ||
"HG00733#2#h2tg000060l_27346699_27361303": [ | ||
{ | ||
"allele_id": "HLA00895", | ||
"star_name": "HLA-DRB3*02:02:01:01", | ||
"length": 13588, | ||
"match_name": "*02:02:01:01", | ||
"query_start": 0, | ||
"query_end": 13588, | ||
"covered_feat": [ | ||
"UTR_5", | ||
"Exon_1", | ||
"Intron_1", | ||
"Exon_2", | ||
"Intron_2", | ||
"Exon_3", | ||
"Intron_3", | ||
"Exon_4", | ||
"Intron_4", | ||
"Exon_5", | ||
"Intron_5", | ||
"Exon_6", | ||
"UTR_3" | ||
], | ||
"not_covered": [], | ||
"coding_diffs": 0, | ||
"noncode_eddist": 0, | ||
"error_rate": null, | ||
"coverage": 1, | ||
"reads": null, | ||
"differences": [] | ||
} | ||
] | ||
} | ||
] | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
## Usage <a name="usage"></a> | ||
|
||
`hifihla` has four subcommands: | ||
* [call-reads](usage_call-reads.md) | ||
* [call-consensus](usage_call-consensus.md) | ||
* [call-contigs](usage_call-contigs.md) | ||
* [align-imgt](usage_align-imgt.md) | ||
|
||
``` | ||
Usage: hifihla <COMMAND> | ||
Commands: | ||
call-reads Call HLA loci from an aligned BAM of HiFi reads | ||
call-contigs Extract HLA loci from assembled MHC contigs & call star alleles on extracted sequences | ||
call-consensus Call HLA Star (*) alleles from consensus sequences | ||
align-imgt Align queries to IMGT/HLA genomic accession sequences | ||
help Print this message or the help of the given subcommand(s) | ||
Options: | ||
-h, --help Print help | ||
-V, --version Print version | ||
``` | ||
|
||
## Subcommand Inputs | ||
| Subcommand | Input Type | File types |Description | | ||
|----------------|-------------------------------------|-----------------|------------| | ||
| call-reads | Aligned HiFi reads | BAM | Call Class I (ABC) from HiFi reads aligned to GRCh38 (no alts) | | ||
| call-contigs | Aligned assembly; unaligned contigs | BAM, FASTA(.gz) | Extract and Call HLA loci from assembled MHC contigs | | ||
| call-consensus | Amplicon/Isoseq consensus | FASTA | Call HLA alleles from consensus sequences (e.g. amplicon assays) | | ||
| align-imgt | Sequence/IMGT accessions | FASTA | Compare sequences in fasta format or database sequences to specific IMGT/HLA genomic alleles | |
Oops, something went wrong.