Releases: HKU-BAL/Clair3
Releases · HKU-BAL/Clair3
v1.0.0
- Added Clair3 version number to the VCF header (#141).
- Fixed the numpy.int issue when using newer numpy version (#165, PR contributor @Aaron Tyler).
- The new version converts all IUPAC bases to 'N' in both VCF and GVCF output, use
--keep_iupac_bases
to keep the IUPAC bases (#153). - Added options
--use_longphase_for_intermediate_phasing
,--use_whatshap_for_final_output_phasing
,--use_longphase_for_final_output_phasing
,--use_whatshap_for_final_output_haplotagging
to disambiguate intermediate phasing and final VCF phasing either using WhatsHap or LongPhase, old options are still usable (#164). - Fixed "shell script interpreter selection problem" when using Clair3 as a host user within a Docker container (#175).
v0.1-r12
- CRAM input is supported (#117).
- Bumped up dependencies' version to "Python 3.9" (#96), "TensorFlow 2.8", "Samtools 1.15.1", "WhatsHap 1.4".
- VCF DP tag now shows raw coverage for both pileup and full-alignment calls (before r12, sub-sampled coverage was shown for pileup calls if average DP > 144, (#128).
- Fixed Illumina representation unification out-of-range error in training (#110).
v0.1-r11.1
Users, please ignore this pre-release. This pre-release is for Zenodo to pull and archive Clair3 for the first time.
v0.1-r11
- Variant calling ~2.5x faster than
v0.1-r10
tested with ONT Q20 data, with feature generation in both pileup and full-alignment now implemented in C (co-contributors @cjw85, @ftostevin-ont, @EpiSlim). - Added the lightning-fast longphase as an option for phasing. Enable using
longphase
with option--longphase_for_phasing
. New option is disabled by default to align with the default behavior of the previous versions, but we recommend enable when calling human variants with ≥20x long-reads). - Added
--min_coverage
and--min_mq
options (#83). - Added
--min_contig_size
option to skip calling variants in short contigs when using genome assembly as input. - Reads haplotagging after phasing before full-alignment calling now integrated into full-alignment calling to avoid generating an intermediate BAM file.
- Supported .
csi
BAM index for large references (#90). For more speedup details, please check Notes on r11.
v0.1-r11 minor 2 patches are included in all installation options
v0.1-r10
-
Added a new ONT Guppy5 model (
r941_prom_sup_g5014
). Click here for some benchmarking results. Thissup
model is also applicable to reads called using thehac
andfast
mode. The oldr941_prom_sup_g506
model that was fine-tuned from the Guppy3,4 model is obsoleted. -
Added
--var_pct_phasing
option to control the percentage of top ranked heterozygous pile-up variants used for WhatsHap phasing.
v0.1-r9
v0.1-r8
v0.1-r7
- Increased
var_pct_full
in ONT mode from 0.3 to 0.7. Indel F1-score increased ~0.2%, but took ~30 minutes longer to finish calling a ~50x ONT dataset. - Expand fall through to next most likely variant if network prediction has insufficient read coverage (#53 commit 09a7d18, contributor @ftostevin-ont), accuracy improved on complex Indels.
- Streamized pileup and full-alignment training workflows. Reduce diskspace demand in model training (#55 commit 09a7d18, contributor @ftostevin-ont).
- Added
mini_epochs
option in Train.py, performance slightly improved in training a model for ONT Q20 data using mini-epochs(#60, contributor @ftostevin-ont). - Massively reduced disk space demand when outputting GVCF. Now compressing GVCF intermediate files with lz4, five times smaller with little speed penalty.
- Added
--remove_intermediate_dir
to remove intermediate files as soon as no longer needed (#48). - Renamed ONT pre-trained models with Medaka's naming convention.
- Fixed training data spilling over to validation data (#57).
v0.1-r6
v0.1-r5
- Modified data generator in model training to avoid memory exhaustion and unexpected segmentation fault by Tensorflow (contributor @ftostevin-ont ).
- Simplified dockerfile workflow to reuse container caching (contributor @amblina).
- Fixed ALT output for reference calls (contributor @wdecoster).
- Fixed a bug in multi-allelic AF computation (AF of [ACGT]Del variants was wrong before r5).
- Added AD tag to the GVCF output.
- Added the
--call_snp_only
option to only call SNP only (#40). - Added pileup and full-alignment output validity check to avoid workflow crashing (#32, #38).