This repository has been archived by the owner on May 13, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from gatk-workflows/wdlupdate
Wdlupdate Added workflow and Json
- Loading branch information
Showing
8 changed files
with
1,096 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,68 @@ | ||
# gatk4-CNNVariant | ||
# gatk4-cnn-variant-filter | ||
|
||
### Purpose : | ||
This repo provides workflows that takes advangate of GATKs CNN tool which a deep learning | ||
approach to filter variants based on Convolutional Neural Networks. | ||
|
||
Please read the following discussion to learn more about the CNN tool: [Deep Learning in GATK4](https://gatkforums.broadinstitute.org/gatk/discussion/10996/deep-learning-in-gatk4). | ||
|
||
### cram2filtered.wdl | ||
This workflow takes an input CRAM/BAM to call variants with HaplotypeCaller | ||
then filters the calls with the CNNVariant neural net tool using the filtering model specified. | ||
|
||
The site-level scores are added to the `INFO` field of the VCF. The architecture arguments, | ||
`info_key` and `tensor_type` arguments MUST be in agreement (e.g. 2D models must have | ||
`tensor_type` of `read_tensor` and `info_key` of `CNN_2D`, 1D models have `tensor_type` of | ||
`reference` and `info_key` of `CNN_1D`). The `INFO` field key will be `1D_CNN` or `2D_CNN` | ||
depending on the neural net architecture used for inference. The architecture arguments | ||
specify pre-trained networks. New networks can be trained by the GATK tools: CNNVariantWriteTensors | ||
and CNNVariantTrain. The CRAM could be generated by the [single-sample pipeline](https://github.com/gatk-workflows/gatk4-data-processing/blob/master/processing-for-variant-discovery-gatk4.wdl). | ||
If you would like test the workflow on a more representative example file, use the following | ||
CRAM file as input and change the scatter count from 4 to 200: gs://gatk-best-practices/cnn-h38/NA12878_NA12878_IntraRun_1_SM-G947Y_v1.cram. | ||
|
||
#### Requirements/expectations : | ||
- CRAM/BAM | ||
- BAM Index (if input is BAM) | ||
|
||
#### Output : | ||
- Filtered VCF and its index. | ||
|
||
### cram2model.wdl | ||
This optional workflow is for advanced users who would like to train a CNN model for filtering variants. | ||
|
||
#### Requirements/expectations : | ||
- CRAM | ||
- Truth VCF and its index | ||
- Truth Confidence Interval Bed | ||
|
||
#### Output : | ||
- Model HD5 | ||
- Model JSON | ||
- Model Plots PNG | ||
|
||
### run_happy.wdl | ||
This optional evaluation and plotting workflow runs a filtering model against truth data (e.g. [NIST Genomes in a Bottle](https://github.com/genome-in-a-bottle/giab_latest_release), [Synthic Diploid Truth Set](https://github.com/lh3/CHM-eval/releases) ) and plots the accuracy. | ||
|
||
#### Requirements/expectations : | ||
- File of VCF Files | ||
- Truth VCF and its index | ||
- Truth Confidence Interval Bed | ||
|
||
#### Output : | ||
- Evaluation summary | ||
- Plots | ||
|
||
### Important Note : | ||
- Runtime parameters are optimized for Broad's Google Cloud Platform implementation. | ||
- For help running workflows on the Google Cloud Platform or locally please | ||
view the following tutorial [(How to) Execute Workflows from the gatk-workflows Git Organization](https://software.broadinstitute.org/gatk/documentation/article?id=12521). | ||
- Please post questions to the [GATK forum](https://gatkforums.broadinstitute.org/gatk/categories/ask-the-team). | ||
- Please visit our [User Guide](https://software.broadinstitute.org/gatk/documentation/) site for further documentation on workflows and tools. | ||
|
||
### LICENSING : | ||
This script is released under the WDL source code license (BSD-3) (see LICENSE in | ||
https://github.com/broadinstitute/wdl). Note however that the programs it calls may | ||
be subject to different licenses. Users are responsible for checking that they are | ||
authorized to run all programs before running this script. Please see the docker | ||
page at https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/ for detailed | ||
licensing information pertaining to the included programs. |
Oops, something went wrong.