Skip to content
Adam English edited this page Jan 8, 2023 · 25 revisions

Documentation In Progress

As described in the phab documentation, a constraint on Truvari bench finding matches is that there needs to be some consistency in how the variants are represented. To help automate the process of running Truvari phab on a benchmarking result, we present the tool refine.

Quick Start

After making a bench result:

truvari bench -b base.vcf.gz -c comp.vcf.gz -o result/

Use refine on the result/

truvari refine -r subset.bed -f ref.fa result/


  • refine.summary.json - result of re-evaluating calls within the specified regions. Same structure as summary.json
  • refine.counts.txt - Tab-delimited file with per-region variant counts
  • phab/ - Per-region results from variant re-evaluation
  • phab_bench/ - Bench results on the harmonized variants

To see an example output, look at test data

Using refine.counts.txt

Column Description
chrom Region's chromosome
start Region's start
end Region's end
in_tpbase Input's True Positive base count
in_tp Input's True Positive comparison count
in_fp Input's false positive count
in_fn Input's false negative count
refined Boolean for if region was re-evaluated
out_tpbase Output's true positive base count
out_tp Output's true positive comparison count
out_fn Outputs false positive count
out_fp Output's false negative count


By default, refine will use the base/comparison variants from the bench results tp-base.vcf.gz, fn.vcf.gz, tp-comp.vcf.gz, and fp.vcf.gz as input for phab. However, this typically contains a filtered subset of variants originally provided to bench since it does filtering such as --sizemin and --passonly. With the --use-original parameter, all of the original calls from the input vcfs are fetched.


This parameter specifies which regions to re-evaluate. If this is not provided, the original bench result's --includebed is used. If both are --regions and --includebed are provided, the --regions are subset to only those intersecting --includebed.

Note that the larger these regions are the slower MAFFT (used by phab) will run. Also, when performing the intersection as described above, there may be edge effects in the reported refine.summary.json. For example, if a --region partially overlaps an --includebed region, you may not be analyzing a subset of calls looked at during the original bench run. Therefore, the *summary.json should be compared with caution.


By default, the reference is pulled from the original bench result's params.json. If reference wasn't used with bench, it must be specified with refine as it's used to realign variants inside phab.


When intersecting includebed with regions, use includebed coordinates. This is helpful for when the original bench result's --includebed boundaries should be used instead of the --regions

Clone this wiki locally