-
Notifications
You must be signed in to change notification settings - Fork 7
TreeSlicer: Example 1
In this example we set the sampling proportion (or sampling rate) to 0 during the period from the origin until the first sample is collected. Thus, the sampling proportion is 0 between t0 and t1 (x1) and constant between t1 and t2 (x2).
Usually no sampling effort was made until (or shortly before) the first sample was collected. For example, in infectious disease outbreaks, there is often a period of cryptic transmission before the outbreak is detected. By default, BDSKY assumes a constant (or at least non-zero) sampling proportion from the origin to the present. During periods where no sampling effort was made, setting a non-zero sampling rate in the model will result in biasing estimates of other model parameters.
With a small modification the examples below can be adapted to set the sampling proportion to 0 during other periods when no sampling effort was made (e.g. seasonal influenza where samples are only collected during winter months).
It is possible to achieve this result without using TreeSlicer. Change the birth-death skyline tree-prior to:
<distribution id="BirthDeathSkySerial"
spec="beast.evolution.speciation.BirthDeathSkylineModel"
tree="@Tree"
reproductiveNumber="@reproductiveNumber"
becomeUninfectiousRate="@becomeUninfectiousRate"
samplingProportion="@samplingProportion"
origin="@origin">
<samplingRateChangeTimes spec="RealParameter" value="0 38.0000001"/>
<reverseTimeArrays spec="BooleanParameter" value="false false true false false false"/>
</distribution>
<samplingRateChangeTimes>
is a RealParameter
(vector of doubles), which is used to set the change-point times for the sampling proportion. Here we add a single change-point time at 38.0000001, which is just a little bit more than the time between the most recent sample (1994) and the oldest sample (1956). We also need to add a 0 for a reason I don't quite understand (anything else either throws an error or crashes BEAST2).
We also need to indicate to BDSKY that the sampling proportion change-times should be reversed, meaning that change-point times are measured as time between the present and the change-time (heights on the tree), instead of between the origin and the change-point time (default behaviour for BDSKY). We do this using the <reverseTimeArrays>
array. Sampling proportion change-times are the third in the array (see BDSKY code/documentation), which is why the third element is set to true
. (Note: since we are also inferring the origin of the tree, it is not possible to fix the change-point time to the time of the oldest sample without reversing the time array).
To actually set the first element of the sampling proportion to 0, we set its initial value to 0, where the sampling proportion parameter is declared (since we only operate on the sampling proportion with scale operators, 0 elements will always remain at 0):
<parameter id="samplingProportion" name="stateNode" lower="0.0" upper="1.0">0 0.01</parameter>
There is one problem left. The prior probability is evaluated at the value of every element in the sampling proportion parameter vector. This is usually not an issue, but many distributions are not defined at 0. To get around this we can use beast.core.util.Slice
or ExcludablePrior
(can be found in multitypetree.distributions
or beast.math.distributions
). I prefer to use Slice
, since it is part of the BEAST2 core and it involves less typing. First declare the slice somewhere in the XML file (not inside the MCMC object):
<function spec="beast.core.util.Slice" id="samplingProportionSlice" arg="@samplingProportion" index="1" count="1"/>
This tells BEAST2 to select 1 element from samplingProportion
, starting at element 1 (Slice
starts numbering at 0). Then, when defining the prior, simply make it point to the slice and not the complete sampling proportion:
<prior id="samplingProportionPrior" name="distribution" x="@samplingProportionSlice">
With TreeSlicer, we set <samplingRateChangeTimes>
to a TreeSlicer object, and explicitly tell it to add one change-point time at the time of the oldest sample:
<distribution id="BirthDeathSkySerial"
spec="beast.evolution.speciation.BirthDeathSkylineModel"
tree="@Tree"
reproductiveNumber="@reproductiveNumber"
becomeUninfectiousRate="@becomeUninfectiousRate"
samplingProportion="@samplingProportion"
origin="@origin">
<samplingRateChangeTimes spec="TreeSlicer" id="SamplingTreeSlice" tree="@Tree"
dimension="2" stop="oldestsample" includeLast="true"/>
<reverseTimeArrays spec="BooleanParameter" value="false false true false false false"/>
</distribution>
Note: We still have to set reverseTimeArrays
, set the first element of the sampling proportion to 0 and use Slice
or ExcludablePrior
to ensure the prior probability is not evaluated at 0.
Note: The dimension of TreeSlicer is the number of slices the tree is divided into, which is always equal to the dimension of the parameter (in this case the sampling proportion), or one more than the number of shifts.
-
Security: With TreeSlicer, the time of the oldest sample is part of the data only, not the model. That means the sequencing data in the XML file can be changed, without needing to change the model specification, which makes the code more secure and general. Suppose we add a new sequence sampled at 1996. Without TreeSlicer, we would need to change the sampling proportion change-point time to 40.0000001. Now suppose we discover an older sample, from 1950. We have to change the change-point time again! With TreeSlicer, no changes need to be made, which means there are less opportunities to introduce errors.
-
Ease of use: With TreeSlicer there is no need to calculate the time between the most recent and the oldest sample. It's easy enough in this example (1994 - 1956 = 38). But suppose the most recent sample was from 24 March 2013, and the oldest sample from 29 February 2000. To calculate the time between them you first have to convert the dates to years (with a fraction) and then subtract. In doing so you also have to account for 2000 being a leap-year and ensure that you are doing the calculation in exactly the same way as BEAST2. This can easily become very tedious when it has to be done for several datasets and it is very easy to introduce errors. (In fact it is nearly impossible not to introduce some round-off errors).
-
Add more change-point times: With TreeSlicer it is a simple matter to add multiple change-point times between the time of the oldest sample and the most recent sample. Simply change the model to
<samplingRateChangeTimes spec="TreeSlicer" id="SamplingTreeSlice" tree="@Tree" dimension="5" stop="oldestsample" includeLast="true"/>
to add 3 equidistant shifts (shown below). Without TreeSlicer, you would need to calculate 38/4 = 9.5 and then change the model to:<samplingRateChangeTimes spec="RealParameter" value="0 9.5 19 28.5 38.0000001"/>
. Again, it is easy to see how this becomes tedious and makes it easy to introduce errors when the month and day of sampling times are known.
Louis du Plessis, 2018