generated from jhudsl/OTTR_Template
-
Notifications
You must be signed in to change notification settings - Fork 2
/
04-outputs.Rmd
268 lines (214 loc) · 14.5 KB
/
04-outputs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
# Understanding pVACtools outputs
```{r, include = FALSE}
ottrpal::set_knitr_image_path()
```
## Learning Objectives
This chapter will cover:
- Understanding the output files produced by pVACtools
- Interpreting the .filtered.tsv file
- Interpreting the .aggregated.tsv file
## pVACtools Output Files
Both pVACseq and pVACfuse produce three main output files:
- The `all_epitopes.tsv` file is a TSV file with all predicted neoantigens and
all information obtained during the run.
- The `filtered.tsv` file is the same structure as the all_epitopes.tsv file
but the entries have been filtered down according to the thresholds set by
the user during the run. The filters will be further explained in
subsequent sections.
- The `aggregated.tsv` is a condensed output file that contains only the
information most pertinent to interpret the results. It has contains only
the best neoantigen candidate for each variant. Our heuristic for
determining the best neoantigen is described in subsequent sections of this
course.
There are also a number of a secondary output files produced by pVACseq and
pVACfuse. The most important are:
- `aggregated.metrics.json`: The file is only produced by pVACseq. It contains
metadata needed for visualizing your results in pVACview.
- `aggregated.tsv.reference_matches`: This file is created when the
reference proteome match feature is enabled during a run. It contains
detailed information about the reference matches found, if there are any.
## Interpreting the filtered.tsv File
The filtered.tsv file takes all the predicted neoantigens from the
all_epitopes.tsv file and applies a number of filters to it. Filters are
applied consecutively, meaning that only the entries passing the first filter
will be passed along to the second filter, and so on. Only neoantigens
passing all filters will be reported in this file.
### Binding Filter
The binding filter's primary function is to filter neoantigen candidates on
their IC50 binding affinity to an HLA allele. Because pVACtools allows users
to run more than one prediction algorithm, we then apply two summarization
methods on the calls for each neoantigen candidate and HLA allele combination:
(1) pVACtools calculates the median IC50 binding affinity for all selected prediction
algorithms (reported in the `Median [MT] IC50 Score` column), and (2) pVACtools selects
the IC50 binding affinity prediction with the lowest value (reported in the
`Best [MT] IC50 Score)` column. By default,
the binding filter is applied to the median IC50 score unless
users set the `--top-score-metric` parameters to `lowest`.
The binding filter discards candidates where the binding affinity is above the
`--binding-threshold` (default: 500). However, users may set the
`--allele-specific-binding-thresholds` flag in order to use differing binding
thresholds depending on the HLA allele of the prediction, as recommended by
[IEDB](https://help.iedb.org/hc/en-us/articles/114094152371-What-thresholds-cut-offs-should-I-use-for-MHC-class-I-and-II-binding-predictions).
Custom thresholds are available for the most common 76 class I HLA alleles.
For all others, the `--binding-threshold` value is used.
In addition to the binding affinity, other optional parameters can be set to
enabled additional filtering on related metrics:
- `--minimum-fold-change`: The fold change is the ratio of the mutant binding affinity to
the wild-type binding affinity, also called agretopicity. A fold change of 1
means that the mutant is a better binder than the wild type. pVACtools
calculates this ratio for both the median as well as the lowest values.
Which one is filtered on for this metric depends again on the
`--top-score-metric` set. When a minimum fold change parameter is set, the binding filter
discards any prediction with a agretopicity below the set cutoff. This
parameter is not available in pVACfuse because there is no matched wildtype
peptide for each neoantigen candidate.
- `--percentile-threshold`: The prediction algorithms supported by pVACtools
also report a percentile score that represents where each neoantigen's predicted
affinity falls in the range of other values for an HLA allele. Similar to
the binding affinity itself, pVACtools report the median and the lowest
percentile scores for the range of scores reported by the prediction
algorithms chosen by the user and which on is used for filtering is again
controlled by the `--top-score-metric` parameter.
### Coverage Filter
The Coverage Filter is generally used to filter out variants that don't have
enough read support or expression. This ensures that the remaining variants
are not just artifacts and that the genes are actually expressed in the
patient's RNA.
For pVACseq, this generally relies on your VCF being annotated with coverage
and expression data. In our example, the VCF has already been annotated with
this data. For more information about how to add [coverage](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html)
and [expression data](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html)
to your own VCFs, please see our docs.
Additionally, filtering on the normal DNA depth and variant allele frequency
(VAF) requires your VCF to be a tumor-normal sample VCF and the normal sample
to be identifies in your pVACseq run using the `--normal-sample-name`
parameter. If a coverage metric doesn't apply because the underlying data is
not available, `NA` is reported by pVACtools. By default, the filter will skip
evaluating a coverage criteria when a neoantigen's value for it is `NA`.
The following thresholds are applied in pVACseq by this filter:
- `--normal-cov`: Normal coverage cutoff. Minimum number of required reads in the normal DNA (default: 5).
- `--tdna-cov`: Tumor DNA coverage cutoff. Minimum number of required reads in the tumor DNA (default: 10).
- `--trna-cov`: Tumor RNA coverage cutoff. Minimum number of required reads in the tumor RNA (default: 10).
- `--normal-vaf`: Normal VAF cutoff. Only sites BELOW this cutoff in the normal DNA will be considered (default: 0.02).
- `--tdna-vaf`: Tumor DNA VAF cutoff. Only sites above this cutoff will be considered (default: 0.25).
- `--trna-vaf`: Tumor RNA VAF cutoff. Only sites above this cutoff will be considered (default: 0.25).
- `--expn-val`: Gene and Transcript expression cutoff. Only sites above this cutoff will be considered (default: 1.0).
For pVACfuse, this filter evaluates a fusion variant's fusion read support and fusion transcript expression.
Arriba natively outputs a number of read metrics. These are the number of supporting split fragments with an anchor in
gene1 or gene2, respectively, as well as the number of pairs (fragments) of discordant mates supporting the fusion
(a.k.a. spanning reads or bridge reads). The sum of these three values is
reported as Read Support in pVACfuse. The fusion transcript expression is
parsed from the `--starfusion-file`, when provided. This is reported as FFPM
(fusion fragments per million total reads).
The following thresholds are applied in pVACfuse by this filter:
- `--read-support`: Read Support cutoff. Sites above this cutoff will be considered (default: 5).
- `--expn-val`: Expression cutoff. Sites above this cutoff will be considered (default: 0.1).
### Transcript Support Level Filter
The Transcript Support Level (TSL) Filter removes neoantigen candidates for
transcripts with a high TSL, as defined [by Ensembl](https://grch37.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl).
The cutoff for this filter is set by the `--maximum-transcript-support-level`
parameter. Transcripts with a TSL of NA will always be filtered out.
Annotation with TSL values through VEP is only available for GRCh38. For other
species and older builds, a value of "Not Supported" is written to the report
and the TSL filter will skip those variants.
This filter is currently only run by pVACseq.
### Top Score Filter
The Top Score Filter will attempt to determine the best neoantigen candidate
for each variants.
For pVACseq it works as follows. Given a set of neoantigen candidates for a
variant we first group the transcripts into sets where all transcripts in a set
code for the same set of neoantigen candidates. For each transcript set we then
determine the best neoantigen candidate as follows:
- Pick all neoantigens with a variant transcript that have a protein_coding Biotype
- Of the remaining candidates, pick the ones with a variant transcript having a
TSL less then the `--maximum-transcript-support-level`.
- Of the remaining candidates, pick the entries with no Problematic Positions.
- Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in
more detail further below).
- Of the remaining candidates, pick the one with the lowest MT IC50 Score (Median or Best
depending on the `--top-score-metric`), lowest TSL, and longest transcript.
This filter then reports the best neoantigen candidate for each transcript set.
For pVACfuse, the neoantigen candidate for each fusion are similarly grouped
into sets where all transcript1-transcript2 combinations in a set code for the
same set of neoantigen candidates. From there, the best neoantigen candidate
for each transcript set is determined by picking the candidate with the lowest
MT IC50 Score (Median or Best depending on the `--top-score-metric`) and the
highest fusion transcript expression.
## Interpreting the aggregated.tsv File
The `aggregated.tsv` is a condensed output file that shows the best neoantigen
candidate for each variant and reports only the information most pertinent to
interpreting the results. It also assigns each of the selected neoantigen candidates
a tier based on its suitability for vaccine manufacturing.
Only epitopes meeting the `--aggregate-inclusion-threshold` are included in this report
(default: 5000). Depending on the value used for the `--top-score-metric`, all neoantigen
candidates with a Median or Best MT IC50 Score below the selected `--aggregate-inclusion-threshold`
are included in creating this report.
### Determining the Best Transcript and Best Peptide of a Variant
In pVACseq, for each variant, all neoantigen candidates meeting the `--aggregate-inclusion-threshold` are evaluated as follows:
- Pick all entries with a variant transcript that have a protein_coding Biotype.
- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`.
- Of the remaining entries, pick the entries with no Problematic Positions.
- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below).
- Of the remaining entries, pick the one with the lowest MT IC50 score( Median or Best
depending on the `--top-score-metric`), lowest Transcript Support Level, and longest transcript.
In pVACfuse, the neoantigen candidate with the lowest IC50 binding affinity for each variant is selected.
The value used for the `--top-score-metric` determines whether the lowest or
median binding affinity is used for this comparison.
The chosen entry determines the best neoantigen candidate and the best
transcript coding for it.
### Tier and Tiering Criteria
For the purpose of assigning tiers, each best peptide is evaluated by a set of
criteria. These criteria and the available tiers differ from tool to tool.
#### Tiering in pVACseq
The Tiers available in pVACseq are:
```{r pvacseq_tiers, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
tabl <- "
| Tier | Criteria |
|------|----------|
| Pass | Best Peptide passes the binding, expression, tsl, clonal, and anchor criteria |
| Anchor | Best Peptide fails the anchor criteria but passes the binding, expression, tsl, and clonal criteria |
| Subclonal | Best Peptide fails the clonal criteria but passes the binding, tsl, and anchor criteria |
| LowExpr | Best Peptide meets the Low Expression Criteria and passes the binding, tsl, clonal, and anchor criteria |
| NoExpr | Best Peptide is not expressed (RNA Expr == 0 or RNA VAF == 0) |
| Poor | Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
"
cat(tabl)
```
**Criteria Details**
```{r pvacseq_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
tabl <- "
| Criteria | Description | Evaluation |
|----------|-------------|------------|
| Binding Criteria | Pass if Best Peptide is a strong binder | IC50 MT < `--binding-threshold` and %ile MT < `--percentile-threshold` (if parameter is set). `--allele-specific-binding-thresholds` flag is respected. |
| Expression Criteria | Pass if Best Transcript is expressed | Allele Expr > `--trna-vaf` * `--expn-val` |
| Low Expression Criteria | Peptide has low expression or no expression but RNA VAF and coverage | (0 < Allele Expr < `--trna-vaf` * `--expn-val`) OR (RNA Expr == 0 AND RNA Depth > `--trna-cov` AND RNA VAF > `--trna-vaf`) |
| TSL Criteria | Pass if Best Transcript has good transcript support level | TSL <= `--maximum-transcript-support-level` |
| Clonal Criteria | Best Peptide is likely in the founding clone of the tumor | DNA VAF > `--tumor-purity` / 4 |
| Anchor Criteria | Fail if all mutated amino acids of the Best Peptide (Pos) are at an anchor position and the WT peptide has good binding (IC50 WT < `--binding-threshold`). `--allele-specific-binding-thresholds` flag is respected. |
"
cat(tabl)
```
#### Tiering in pVACfuse
The Tiers available in pVACfuse are:
```{r pvacfuse_tiers, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
tabl <- "
| Tier | Criteria |
|------|---------|
| Pass | Best Peptide passes the binding, read support, and expression criteria |
| LowReadSupport | Best Peptide fails the read support criteria but passes the binding and expression criteria |
| LowExpr | Best Peptide fails the expression criteria but passes the binding and read support criteria |
| Poor | Best Peptide doesn’t fit any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
"
cat(tabl)
```
**Criteria Details**
```{r pvacfuse_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
tabl <- "
| Criteria | Description | Evaluation |
|----------|-------------|------------|
| Binding Criteria | Pass if Best Peptide is strong binder | IC50 MT < `--binding-threshold` and %ile MT < `--percentile-threshold` (if parameter is set). `--allele-specific-binding-thresholds` flag is respected. |
| Read Support Criteria | Pass if the variant has read support | Read Support < `--read-support` |
| Expression Criteria | Pass if Best Transcript is expressed | Expr < `--expn-val` |
"
cat(tabl)
```