Skip to content

Commit

Permalink
Merge pull request #2 from PacificBiosciences/update
Browse files Browse the repository at this point in the history
update docs for version bump
  • Loading branch information
jrharting authored Apr 5, 2024
2 parents d7b358f + 28090f1 commit cf67053
Show file tree
Hide file tree
Showing 7 changed files with 72 additions and 55 deletions.
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
<h1 align="center">HiFiHLA</h1>

***
An HLA star-calling tool for PacBio HiFi data types.
**An HLA star-calling tool for PacBio HiFi data types**

HiFiHLA generates high resolution (4-field) HLA allele calls from PacBio HiFi data. HiFiHLA identifies the closest matching allele(s) and any differences between a sample and the IPD-IMGT/HLA database. Acceptable data types include aligned HiFi reads, assembly contigs, and amplicon consensus.

Authors: [John Harting](https://github.com/jrharting), [Zev Kronenberg](https://github.com/zeeev), [Daniel Baker](https://github.com/dnbaker), [Matt Holt](https://github.com/holtjma)

Expand All @@ -17,7 +19,19 @@ Authors: [John Harting](https://github.com/jrharting), [Zev Kronenberg](https://
2. [Genes](docs/genes.md)
3. [Usage and Examples](docs/usage.md)
4. [Output](docs/output.md)
6. [Changelog](CHANGELOG.md)
6. [Changelog](docs/changelog.md)

## Need help?
If you notice any missing features, bugs, or need assistance with analyzing the output of HiFiHLA,
please don't hesitate to open a GitHub issue.

## Support information
HiFiHLA is a pre-release software intended for research use only and not for use in diagnostic procedures.
While efforts have been made to ensure that HiFiHLA lives up to the quality that PacBio strives for, we make no warranty regarding this software.

As HiFiHLA is not covered by any service level agreement or the like, please do not contact a PacBio Field Applications Scientists or PacBio Customer Service for assistance with any HiFiHLA release.
Please report all issues through GitHub instead.
We make no warranty that any such issue will be addressed, to any extent or within any time frame.

## References <a name="references"></a>
Barker DJ, Maccari G, Georgiou X, Cooper MA, Flicek P, Robinson J, Marsh SGE. _The IPD-IMGT/HLA Database_. Nucleic Acids Research (2023) 51:D1053-60.
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md → docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# v0.3.1: 04/05/24
## Changes
- Add output prefix option (takes directory or directory + prefix name)
- Deprecate `outdir` (maintain backwards compatibility until v1.0)
- Fix bug in call-reads where a read with partial exon2 (only) coverage blows up candidate pool
- Catch error from aligned inputs with wrong reference

# v0.3.0: 03/21/24
## Changes
- New tool `call-reads` to call from HiFi reads (limited to class I)
Expand Down
14 changes: 7 additions & 7 deletions docs/output.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
## Output <a name="output"></a>
`call-reads`, `call-consensus` and `call-contigs` all generate three reports containing HLA star-allele type calls. Additionally, `call-contigs` produces fasta files of extracted sequences from the assembly.
`call-reads`, `call-consensus` and `call-contigs` all generate three reports containing HLA star-allele type calls. Additionally, `call-contigs` produces fasta files of extracted sequences from the assembly. If `out_prefix` is given as a directory _+ prefix/samplename_, output files will be joined to the prefix with underscore `_`.

| File | Description |
| -------------------------------------------- | ----------- |
| {output_dir}/hifihla_summary.tsv | Detailed file listing best call for each locus |
| {output_dir}/hifihla_report.tsv | Simple tsv file listing calls for each locus |
| {output_dir}/hifihla_report.json | Detailed results file, see below for example |
| {output_dir}/asm.contigs.h[12].fasta | Extracted (full) assembly contigs aligning to MHC |
| {output_dir}/asm.contigs.h[12].fasta.fai | FASTA index for contigs |
| {output_dir}/extracted.targets.h[12].fasta | Extracted targets used for star-typing |
| {out_prefix}\[_/\]hifihla_summary.tsv | Detailed file listing best call for each locus |
| {out_prefix}\[_/\]hifihla_report.tsv | Simple tsv file listing calls for each locus |
| {out_prefix}\[_/\]hifihla_report.json | Detailed results file, see below for example |
| {out_prefix}\[_/\]asm.contigs.h[12].fasta | Extracted (full) assembly contigs aligning to MHC |
| {out_prefix}\[_/\]asm.contigs.h[12].fasta.fai | FASTA index for contigs |
| {out_prefix}\[_/\]extracted.targets.h[12].fasta | Extracted targets used for star-typing |

### Detailed summary tsv
This file reports the single best call and statistics for each query sequence in the sample.
Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Options:
## Subcommand Inputs
| Subcommand | Input Type | File types |Description |
|----------------|-------------------------------------|-----------------|------------|
| call-reads | Aligned HiFi reads | BAM | Call Class I (ABC) from HiFi reads aligned to GRCh38 (no alts) |
| call-reads | Aligned HiFi reads | BAM | Call Class I (ABC) from HiFi reads aligned to [GRCH38 no alts](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz) |
| call-contigs | Aligned assembly; unaligned contigs | BAM, FASTA(.gz) | Extract and Call HLA loci from assembled MHC contigs |
| call-consensus | Amplicon/Isoseq consensus | FASTA | Call HLA alleles from consensus sequences (e.g. amplicon assays) |
| align-imgt | Sequence/IMGT accessions | FASTA | Compare sequences in fasta format or database sequences to specific IMGT/HLA genomic alleles |
38 changes: 18 additions & 20 deletions docs/usage_call-consensus.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,26 @@ Optionally call star alleles using exon sequence only.
```
Call HLA Star (*) alleles from consensus sequences
Usage: hifihla call-consensus [OPTIONS] --fasta <FASTA> --outdir <OUTDIR>
Usage: hifihla call-consensus [OPTIONS] --fasta <FASTA>
Options:
-f, --fasta <FASTA> Input fasta file of consensus sequences
-o, --outdir <OUTDIR> Output directory
-c, --cdna Enable cDNA-only calling
-e, --exon2 Require Exon2 in query sequence
-j, --threads <THREADS> Analysis threads [default: 1]
-x, --max_matches <MATCHES> Maximum equivalent matches per query in report [default: 10]
-v, --verbose... Enable verbose output
--log-level <LOG_LEVEL> Alternative to repeated -v/--verbose: set log level via key.
Equivalence to -v/--verbose:
=> "Warn"
-v => "Info"
-vv => "Debug"
-vvv => "Trace" [default: Warn]
-h, --help Print help
-V, --version Print version
-f, --fasta <FASTA> Input fasta file of consensus sequences
-o, --out_prefix <OUT_PREFIX> Output prefix
--outdir <OUTDIR> Output directory [deprecated]
-c, --cdna Enable cDNA-only calling
-e, --exon2 Require Exon2 in query sequence
-l, --full_length Full length IMGT records only
-j, --threads <THREADS> Analysis threads [default: 1]
-x, --max_matches <MATCHES> Maximum equivalent matches per query in report [default: 10]
-v, --verbose... Enable verbose output
--log-level <LOG_LEVEL> Alternative to repeated -v/--verbose: set log level via key.
-h, --help Print help
-V, --version Print version
```
#### Options Description
* `--fasta` Fasta file of consensus query sequences. Only one allele per query sequence.
* `--outdir` Output directory.
* `--out_prefix` Output prefix, accepts a directory or a directory + prefix.
* `--outdir` Output directory \[deprecated\].
* `--cdna` Call and report only coding regions (cdna). Can be used for either DNA or RNA sequences.
* `--exon2` Require exon 2 in query. This may reduce search space.
* `--max_matches` Only report up to this number of matches in the json report.
Expand All @@ -35,9 +33,9 @@ Type HLA consensus sequences, for example from [HiFi amplicon consensus with pba
```
hifihla call-consensus \
--fasta pbaa_12878-HG001_passed_cluster_sequences.fasta \
--outdir my_output_dir/
--out_prefix out_dir/my_sample
column -t my_output_dir/hifihla_summary.tsv
column -t out_dir/my_sample_hifihla_summary.tsv
queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
sample-12878-HG001_guide-HLA-A_cluster-0_ReadCount-1023 3098 5 HLA-A*01:01:01:01 100.0 88.43 0 HLA-A*01:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*01:01:01
Expand All @@ -57,7 +55,7 @@ Note that query sequences that match >1 allele are not labeled as four-field mat
For example, HLA-A_cluster-1 above matches three alleles perfectly over the amplified range. All three matches are listed in the json.

```
jq '.. | objects | to_entries | .[] | select(.key == "sample-12878-HG001_guide-HLA-A_cluster-1_ReadCount-999")' hifihla_report.json
jq '.. | objects | to_entries | .[] | select(.key == "sample-12878-HG001_guide-HLA-A_cluster-1_ReadCount-999")' my_sample_hifihla_report.json
{
"key": "sample-12878-HG001_guide-HLA-A_cluster-1_ReadCount-999",
Expand Down
20 changes: 8 additions & 12 deletions docs/usage_call-contigs.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,22 @@ Options:
* Limit output matches to full genomic IMGT accessions with `--full_length`.
```
Extract HLA loci from assembled MHC contigs & call star alleles on extracted sequences
Usage: hifihla call-contigs [OPTIONS] --abam <ALIGNED_ASSEMBLY> --hap1 <HAP1_FA> --outdir <OUTDIR>
Usage: hifihla call-contigs [OPTIONS] --abam <ALIGNED_ASSEMBLY> --hap1 <HAP1_FA>
Options:
-a, --abam <ALIGNED_ASSEMBLY> Input assembly aligned to GRCh38
-p, --hap1 <HAP1_FA> Input hap1 assembly fa(.gz)
-m, --hap2 <HAP2_FA> Input hap2 assembly fa(.gz) (optional)
-o, --outdir <OUTDIR> Output directory
-o, --out_prefix <OUT_PREFIX> Output prefix
--outdir <OUTDIR> Output directory [deprecated]
-l, --loci [<LOCI>...] Input comma-sep loci to extract [default: all]
-s, --min_length <MINLENGTH> Minimum length of extracted targets [default: 1000]
-f, --full_length Full length IMGT records only
-x, --max_matches <MATCHES> Maximum equivalent matches per query in report [default: 10]
-j, --threads <THREADS> Analysis threads [default: 1]
-v, --verbose... Enable verbose output
--log-level <LOG_LEVEL> Alternative to repeated -v/--verbose: set log level via key.
Equivalence to -v/--verbose:
=> "Warn"
-v => "Info"
-vv => "Debug"
-vvv => "Trace" [default: Warn]
-h, --help Print help
-V, --version Print version
```
Expand Down Expand Up @@ -59,9 +55,9 @@ hifihla call-contigs \
--abam HG00733.asm.GRCh38_no_alts.bam \
--hap1 HG00733.paternal.f1_assembly_v2.fa.gz \
--hap2 HG00733.maternal.f1_assembly_v2.fa.gz \
--outdir my_output_dir
--out_prefix out_dir/my_sample
head -7 my_output_dir/hifihla_summary.tsv | column -t
head -7 out_dir/my_sample_hifihla_summary.tsv | column -t
queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
HG00733#1#h1tg000070l_29911131_29915604 4474 1 HLA-A*24:02:01:01 100.0 100.0 0 HLA-A*24:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*24:02:01:01
Expand All @@ -77,9 +73,9 @@ hifihla call-contigs \
--abam HG00733.asm.GRCh38_no_alts.bam \
--hap1 HG00733.paternal.f1_assembly_v2.fa.gz \
--loci HLA-DQA1,HLA-DPA1,HLA-DRB1 \
--outdir my_output_dir
-o out_dir/my_sample
column -t my_output_dir/hifihla_summary.tsv
column -t out_dir/my_sample_hifihla_summary.tsv
queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
HG00733#1#h1tg000070l_33003286_33014048 10763 1 HLA-DPA1*01:03:01:02 100.0 100.0 0 HLA-DPA1*01:03:01 1,2,3,4 0 1 N/A HLA-DPA1*01:03:01:02
Expand Down
28 changes: 15 additions & 13 deletions docs/usage_call-reads.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
```
Call HLA loci from an aligned BAM of HiFi reads
Usage: hifihla call-reads [OPTIONS] --abam <ALIGNED_READS> --outdir <OUTDIR>
Usage: hifihla call-reads [OPTIONS] --abam <ALIGNED_READS>
Options:
-j, --threads <THREADS> Analysis threads [default: 1]
Expand All @@ -13,33 +13,35 @@ Options:
-V, --version Print version
Input Options:
-a, --abam <ALIGNED_READS> Input assembly aligned to GRCh38
-a, --abam <ALIGNED_READS> Input assembly aligned to GRCh38 (no alts)
-l, --loci [<LOCI>...] Input comma-sep loci to extract [default: all]
-d, --max_depth <MAX_DEPTH> Maximum reads per locus [default: 50]
-p, --partial Include partially-spanning reads
-t, --haplotypes <HAPLOTYPES> Haplotypes in sample [default: 2] [possible values: 1, 2]
-s, --seed <SEED> Random number seed for downsampling to max_depth [default: 42]
Output Options:
-o, --outdir <OUTDIR> Output directory
-f, --full_length Full length IMGT records only (exclude exon-only records)
-x, --max_matches <MATCHES> Maximum matches in output report [default: 10]
-m, --min_allele_freq <MAF> Minimum allele frequency [default: 0.1]
-b, --min_cdf <MINCDF> Minimum binomial CDF to call het/hom [default: 0.001]
-o, --out_prefix <OUT_PREFIX> Output prefix
--outdir <OUTDIR> Output directory [deprecated]
-f, --full_length Full length IMGT records only (exclude exon-only records)
-x, --max_matches <MATCHES> Maximum matches in output report [default: 10]
-m, --min_allele_freq <MAF> Minimum allele frequency [default: 0.1]
-b, --min_cdf <MINCDF> Minimum binomial CDF to call het/hom [default: 0.001]
Presets:
--preset <PRESET> Sequence type presets [possible values: te, wgs]
```
#### Input Options Description
* `--abam` HiFi reads aligned to GRCh38 (no alts).
* `--abam` HiFi reads aligned to [GRCH38 no alts](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz).
* `--loci` HLA loci to call. Currently limited to HLA-A,HLA-B,HLA-C.
* `--max_depth` Maximim reads to use per locus. Reads are randomly downsampled if coverage > d.
* `--partial` Include HiFi reads that do not fully span locus, but still span exon 2 (minimum requirement).
* `--haplotypes` Expected number of haploytypes in sample.
* `--seed` Random number seed for downsampling and clustering.

### Output Options Description
* `--outdir` Output directory.
* `--out_prefix` Output prefix, accepts a directory or a directory + prefix.
* `--outdir` Output directory \[deprecated\].
* `--full_length` Restrict allele matches to full length IMGT records (exclude exon-only accessions).
* `--max_matches` Maximum number of equivalent matches to list per query sequence.
* `--min_allele_freq` Minimum fraction of reads for minor allele. Clusters with lower frequency will be ignored.
Expand All @@ -55,9 +57,9 @@ hifihla call-reads \
--preset wgs \
-j 8 \
-a HG00733.GRCh38_no_alts.bam \
-o my_output_dir
-o out_dir/my_sample
column -t my_output_dir/hifihla_summary.tsv
column -t out_dir/my_sample_hifihla_summary.tsv
queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
HLA-A_1 3502 1 HLA-A*24:02:01:01 100.0 100.0 0 HLA-A*24:02:01 1,2,3,4,5,6,7,8 0 9 0.00346 HLA-A*24:02:01:01
Expand Down Expand Up @@ -93,9 +95,9 @@ hifihla call-reads \
-f \
-l HLA-A \
-a NA12889.GRCH38.haplotagged.bam \
-o my_output_dir
-o out_dir/my_sample
cat hifihla_report.json
cat out_dir/my_sample_hifihla_report.json
{
"sample_id": "NA12889.GRCh38.haplotagged",
"version": "hifihla 0.3.0;IPD-IMGT/HLA 3.55 (2024-01)",
Expand Down

0 comments on commit cf67053

Please sign in to comment.