This release includes the following changes.
Slurm workload manager is supported for all handlers.
GATK v4.1.2 on the Slurm queueing system is supported for the following handlers:
- Haplotype_Caller
- Added Genomic_DB_Import handler (this combines GVCF files prior to running Genotype_GVCFs handler)
- Genotype_GVCF
- Create_HC_Subset (preparation steps for GATK Variant Recalibrator)
- Variant_Recalibrator
GATK v4.1.2 on non-PBS queueing systems is supported for the following handlers:
- Haplotype_Caller
- Genotype_GVCF
- Variant_Filtering
Additional changes:
- VCF annotation visualization to assist filtering has also been added.
- Jupyter Notebook template for exploring VCF files prior to variant recalibration/filtering steps is now available in the
HelperScripts
directory - Realigner_Target_Creator and Indel_Realigner handlers have been separated from the main pipeline because the functionality is only available in GATK 3 or earlier and we still need indel realignment for other downstream tools. Please fill out
Config_Indel_Realign
for indel realignment steps. - Main
Config
file has been updated accordingly with updates to handlers. A few new variables have been added. - Haplotype_Caller, Genomics_DB_Import, and Genotype_GVCFs now handle parallelizing across regions using job arrays.
- This version allows you to re-run specific job array numbers with an optional
-t custom_array_indices
argument from the command line (instead of having to re-create your sample list for failed/aborted jobs). So you can now run it like this:
./sequence_handling SAM_Processing /path/to/config -t 1-5,10,12
Without the -t
flag, by default runs all samples in your list. So you can still run sequence_handling like this: ./sequence_handling SAM_Processing /path/to/config
This will work for any handler that utilizes job arrays.
- Create_HC_Subset can now handle very large VCF files (>1TB vcf files) in a reasonable manner
- Variant_Recalibrator now has additional features:
- Can specify recalibration "mode" to recalibrate both indels and snps, indels only, or snps only
- Allows specification of a custom set of annotations in the config file
- Allows specification of additional options/flags to include
- Allows more control over setting resource datasets as known, training, or truth sets
- Automatically indexes raw vcf file and resource files if they are not already indexed