-
Notifications
You must be signed in to change notification settings - Fork 49
refine
Documentation In Progress
As described in the phab documentation, a constraint on Truvari bench
finding matches is that there needs to be some consistency in how the variants are represented. To help automate the process of running Truvari phab
on a benchmarking result, we present the tool refine
.
After making a bench
result:
truvari bench -b base.vcf.gz -c comp.vcf.gz -o result/
Use refine
on the result/
truvari refine -r subset.bed -f ref.fa result/
-
refine.summary.json
- result of re-evaluating calls within the specified regions. Same structure as summary.json -
refine.counts.txt
- Tab-delimited file with per-region variant counts -
phab/
- Per-region results from variant re-evaluation -
phab_bench/
- Bench results on the harmonized variants
To see an example output, look at test data
Column | Description |
---|---|
chrom | Region's chromosome |
start | Region's start |
end | Region's end |
in_tpbase | Input's True Positive base count |
in_tp | Input's True Positive comparison count |
in_fp | Input's false positive count |
in_fn | Input's false negative count |
refined | Boolean for if region was re-evaluated |
out_tpbase | Output's true positive base count |
out_tp | Output's true positive comparison count |
out_fn | Outputs false positive count |
out_fp | Output's false negative count |
By default, refine
will use the base/comparison variants from the bench
results tp-base.vcf.gz
, fn.vcf.gz
, tp-comp.vcf.gz
, and fp.vcf.gz
as input for phab
. However, this typically contains a filtered subset of variants originally provided to bench
since it does filtering such as --sizemin
and --passonly
. With the --use-original
parameter, all of the original calls from the input vcfs are fetched.
This parameter specifies which regions to re-evaluate. If this is not provided, the original bench
result's --includebed
is used. If both are --regions
and --includebed
are provided, the --regions
are subset to only those intersecting --includebed
.
Note that the larger these regions are the slower MAFFT (used by phab
) will run. Also, when performing the intersection as described above, there may be edge effects in the reported refine.summary.json
. For example, if a --region
partially overlaps an --includebed
region, you may not be analyzing a subset of calls looked at during the original bench
run. Therefore, the *summary.json
should be compared with caution.
By default, the reference is pulled from the original bench
result's params.json
. If reference wasn't used with bench
, it must be specified with refine
as it's used to realign variants inside phab
.
When intersecting includebed with regions, use includebed coordinates. This is helpful for when the original bench result's --includebed
boundaries should be used instead of the --regions