Skip to content

TreeSlicer: Example 3

Louis edited this page Aug 27, 2019 · 12 revisions

Specifying change-point times as dates

In this example we set the change-point times to specific dates (1950, 1960, 1970 and 1980). These times remain fixed throughout the analysis.

Change-point times at specific dates

Why do this?

There are situations where we know the date of specific events (such as the implementation of harm-reduction policies, increases in sampling efforts or a geographic event) which are known to have an effect on the rates of the birth-death skyline model. In such cases where the change-point times are known a priori it makes sense to fix the change-point times these dates. (In such a scenario simply using equidistantly spaced change-point times will result in the model calculating an average rate across the true shift-time. Similarly, estimating the change-point date will increase uncertainty in the estimates of all other parameters).

Without TreeDateSlicer

Example XML: dengue4_bdsky_fixeddates.xml

It is possible to achieve this result without using TreeDateSlicer, by converting the dates to heights on the tree (time between the dates and the most recent sample). Thus, if the most recent sample was in 1994, and we want to introduce change-point times for the reproductive number at 1950, 1960, 1970 and 1980, change the birth-death skyline tree-prior to:

<distribution id="BirthDeathSkySerial" 
              spec="beast.evolution.speciation.BirthDeathSkylineModel" 
              tree="@Tree"
              reproductiveNumber="@reproductiveNumber" 
              becomeUninfectiousRate="@becomeUninfectiousRate" 
              samplingProportion="@samplingProportion" 
              origin="@origin">
    <birthRateChangeTimes spec="RealParameter" value="0.0 14.0 24.0 34.0 44.0"/>
    <reverseTimeArrays spec="BooleanParameter" value="true false false false false false"/>
</distribution>

We also need to indicate to BDSKY that the reproductive number (or birth-rate) change-point times should be reversed, meaning that change-point times are measured as time between the present and the change-point time (heights on the tree), instead of between the origin and the change-point time (default behaviour for BDSKY). Measuring the change-point time from the origin would mean that the change-point times won't be fixed at specific dates because the origin time is also estimated (the time between the origin and any given date will change as the MCMC chain samples the origin parameter). We do this using the reverseTimeArrays array. Reproductive number (or birth-rate) change-point times are the first in the array (see BDSKY code/documentation), which is why the first element is set to true.

With TreeDateSlicer

Example XML: dengue4_bdsky_fixeddates_treeslicer.xml

With TreeDateSlicer we change birthRateChangeTimes to a TreeDateSlicer object, and directly enter the change-point times as dates:

<distribution id="BirthDeathSkySerial" 
              spec="beast.evolution.speciation.BirthDeathSkylineModel" 
              tree="@Tree"
              reproductiveNumber="@reproductiveNumber" 
              becomeUninfectiousRate="@becomeUninfectiousRate" 
              samplingProportion="@samplingProportion" 
              origin="@origin">
    <birthRateChangeTimes spec="TreeDateSlicer" id="ReTreeSlice" tree="@Tree" 
                          dates="1950 1960 1970 1980"/>
    <reverseTimeArrays spec="BooleanParameter" value="true false false false false false"/>
</distribution>

Note: We still have to set reverseTimeArrays.

Note: In the specification above we only entered the change-point times (one less than the dimension of the reproductive number parameter). Normally with BDSKY the change-point times vector needs to be (n+1) elements long for n change-point times (i.e. the same length as the parameter). If we had entered the change-point dates as,

<birthRateChangeTimes spec="TreeDateSlicer" id="ReTreeSlice" tree="@Tree"
                          dates="1950 1960 1970 1980 1994"/>

with an extra change-point date at the date of the most recent sample the result would be identical. (TreeDateSlicer is clever enough to figure out that the last change-point date is the same as the most recent sampling date).

Example XML: dengue4_bdsky_datestrings_treeslicer.xml

In the above example the change-point dates are set as years. It is straightforward enough when the scale of the epidemic is on years or seasons to set the change-point dates as for examples 2015, 2015.25, 2015.5 etc. However, for many acute epidemics events happen on the scale of days and the exact date needs to be set. It can be complicated and tedious to convert a date in a year to a fraction (you have to keep track of leap years and do the conversion in exactly the same way as BEAST2). It's easy enough to see that you are likely to introduce some errors if you have to do this for many different analyses.

This can be avoided by replacing the dates attribute of the TreeDateSlicer object with a DateParser,

<birthRateChangeTimes spec="TreeDateSlicer" id="ReTreeSlice" tree="@Tree">
    <dates spec="DateParser" format="dd-MMM-yyyy"
           dates="01-Jan-1950 30-Jun-1960 23-Apr-1970 31-Dec-1980"/>
</birthRateChangeTimes>

A DateParser object converts strings of dates to decimal dates, according to the provided format. In this case the date format is "dd-MMM-yyyy" which represents a 2-digit day, a 3-letter month abbreviation and a 4-digit year. The full documentation of all the placeholders that can be used in the format string is available at https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html. When using a DateParser the TreeDateSlicer object can still be logged as before.

So why use TreeDateSlicer?

  • Security: With TreeDateSlicer, the change-point times are part of the model only, and are completely independent of the data. That means the sequencing data in the XML file can be changed, without needing to change the model specification, which makes the code more secure and general. Suppose we add a new sequence sampled at 1996. Without TreeDateSlicer, we would need to recompute the times between the most recent sample and the change-point dates (or else all the change-point dates will be shifted). With TreeDateSlicer, no changes need to be made, which means there are less opportunities to introduce errors.

  • Ease of use: With TreeDateSlicer there is no need to calculate the time between the most recent sample and the change-point dates. It's easy enough in this example (1994 - 1980 = 14 etc.). But suppose the most recent sample was from 24 March 2013, and the change-point date is 29 February 2000. To calculate the time between them you first have to convert the dates to years (with a fraction) and then subtract. In doing so you also have to account for 2000 being a leap-year and ensure that you are doing the calculation in exactly the same way as BEAST2. This can easily become very tedious when it has to be done for several datasets and it is very easy to introduce errors. (In fact it is nearly impossible not to introduce some round-off errors).