Strains with ambiguous dates are assigned two different dates in auspice output #48

huddlej · 2020-01-10T00:20:37Z

Fewer than 1% of strains sampled between 2017—2019 have ambiguous dates, but when these strains are included in a seasonal flu build, the pipeline assigns them two different effective dates. The first date is inferred by TreeTime. The second date is read dates from the metadata and converted to numerical values prior to frequency estimation.

These differences lead to examples like A/Malaysia/RP0118/2019 which has an ambiguous date of 2019-XX-XX, a TreeTime-inferred date of January 1, 2019, and a brute-force guess from metadata of June 2019. For this example, the user interface confusingly displays two different dates as shown below. This problem manifests in the forecasting pipeline where tips with non-zero frequencies are used to project the population one year into the future. In the example below, the strain with an ambiguous date is included in the forecast because of its frequency estimation date even though it was most likely circulating 6 months earlier.

We could fix this issue in a couple ways:

Exclude all strains with ambiguous dates
Use TreeTime inferred dates for frequency estimation

The first solution seems to be the simplest and since fewer than 1% of recent sequences have ambiguous dates, this filter shouldn’t adversely affect the quality of the flu builds. This solution only requires a change to the seasonal flu build.

The second solution requires a change to a core augur interface. It might be nice in the long run to estimate frequencies from the most accurate information available, though. Tip frequency estimation already requires the Newick tree from augur refine, so adding the branch_lengths.json as an input to frequencies wouldn’t require any major rewiring of existing pipelines.

My preference is for the first solution.

The text was updated successfully, but these errors were encountered:

huddlej added the bug Something isn't working label Jan 10, 2020

huddlej mentioned this issue Aug 12, 2020

filter: support filtering of ambiguous dates nextstrain/augur#602

Closed

huddlej self-assigned this Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strains with ambiguous dates are assigned two different dates in auspice output #48

Strains with ambiguous dates are assigned two different dates in auspice output #48

huddlej commented Jan 10, 2020

Strains with ambiguous dates are assigned two different dates in auspice output #48

Strains with ambiguous dates are assigned two different dates in auspice output #48

Comments

huddlej commented Jan 10, 2020