clarify example 17.4 #244

petrelharp · 2021-11-22T18:15:27Z

This example is confusing because if the trees aren't coalesced, TMRCA means something different.

Consider what happens if, say, on the left half of the chromosome, all the trees have coalesced, with a mean TMRCA of around 800Kya; while on the right half the trees have not coalesced, and the mean height of the roots of the trees there are lower, only 600Kya. (This might be expected, because the "TMRCAs" on the right half are not ancestors to the entire population, just some of it, and so possibly less old.) Then, recapitation does nothing to the left half of the genome, but simulates a bunch more history on the right half, pushing the TMRCA back on the right half. So, in the original simulation TMRCA looks lower in the right half, but in reality (ie after recap) it's higher. This is not abstract, this has actually happened to someone on the SLiM mailing list.

To fix this, I think the code should error if there's more than one root:

# Measure the tree height at each base position
height_for_pos = np.zeros(int(ts.sequence_length))
for tree in ts.trees():
   assert tree.num_roots == 1
   left, right = map(int, tree.interval)
   height_for_pos[left: right] = tree.time(tree.root)

The text was updated successfully, but these errors were encountered:

bhaller · 2021-11-22T18:20:41Z

As we chatted about on Slack, we will need to zoom about this so I understand all this better. For now, I will just ask: why does recapitation enter into this? Recipe 17.4 does not recapitate. So, what happened to the person on the mailing list seems to me to be independent on recipe 17.4; recapitation is not involved here, we're just looking at mean tree height post-simplification. Which is not to say that there might not be a problem here, but I need to better understand why. That multiple-root stuff was put in after discussion with, IIRC, both you and Jerome, so let's make sure we truly understand everything going on here before we mess with it. :->

petrelharp · 2021-11-22T18:44:36Z

Presumably, the original recipe runs for long enough that recapitation is not necessary. Otherwise, looking at TMRCAs doesn't make sense. The mailing list issue occurred, I think, when someone increased the pop size (and maybe other changes) that made the tree seuqence much less coalesced.

bhaller · 2021-11-22T18:51:54Z

Hmm, not sure, would have to check. It does run for a long time. But we had to add that code for multiple roots because there were multiple roots sometimes, IIRC. And it seemed to make sense to all of us at that time. :->

petrelharp · 2021-11-22T19:13:44Z

Hmm, not sure, would have to check. It does run for a long time. But we had to add that code for multiple roots because there were multiple roots sometimes, IIRC. And it seemed to make sense to all of us at that time. :->

Yes, well, that sounds plausible. And it does indeed seem to make sense in that example. But, as we've seen, with a larger population size it starts to not make sense.

I think the options are:

run if for longer so there's always only one root
recapitate
use diversity instead of TMRCA (and explain we're assuming no diversity in the initial population)

bhaller · 2023-09-17T23:35:48Z

OK, discussed this in person. Re: (1), tried running it longer, but after 30N instead of 10N it still wasn't fully coalesced. Running it even longer would eventually get there, but ugh. Kind of teaches the wrong lesson, too, since really this ought to be a recapitation recipe. Re: (2), yes, it ought to recapitate, but at present recapitation is treated as an advanced topic at the end of the chapter. We agreed that that probably ought to change – recapitation should be moved early in the chapter and used more widely. But that's a much bigger change, belonging in #351 perhaps. Not going to happen for now. (3) doesn't really achieve the goals of the recipe, which is to show how to play around with trees, illustrate the idea of tree height as a proxy of diversity, etc.; it would work fine but just not show what I'm trying to show. So in the end we agreed: (1) don't change the recipe code at all, and (2) change the discussion to talk about how recapitation would be a better approach, or you could run the recipe for so long that everything coalesces, but short of those alternatives, we allow multiple tree roots to still occur and handle it in the Python code. This is also pedagogically good because it gives an opportunity to discuss multiple tree roots, why they happen, and what it means (a lesson that shifting this recipe to recapitation, or running it all the way to coalescence, would not teach, so there are actually advantages to the way it does things).

So here's my rewritten discussion snippet:

Note that the simplify() of the loaded tree sequence here is essential; without it, every tree would have a root in the first generation, in one or another original ancestor, and all the tree heights would be the same. The simplify() strips away the original ancestors, giving us trees with roots representing the most recent common ancestors for each tree. It should be emphasized that this is a very unusual approach; most of the time, calling simplify() on a tree sequence from SLiM is undesirable, and you should think very carefully before doing it since it throws away information and makes correct recapitation impossible.

and

Instead of calling simplify() to strip away ancestry above the most recent common ancestors, an alternative approach would use tskit’s diversity() method to compute diversity on the tree sequence without simplifying. Examples of using diversity() can be found in tskit’s documentation. Similarly, instead of allowing multiple tree roots to occur, we could either (a) run for long enough to coalesce at every site, or (b) recapitate the tree sequence to ensure coalescence, as we will see section 17.10. Recapitation would perhaps be the best approach, but we haven’t seen it yet, and seeing how to handle multiple tree roots is also a useful exercise.

I think that is pretty good, and I actually like that it gives us a chance to discuss multiple tree roots. Closing this issue, finally.

bhaller added the trees related to tree-seq, tskit, etc. label Nov 22, 2021

bhaller mentioned this issue Sep 9, 2022

Tree-seq docs in the SLiM manual need a review #351

Open

bhaller closed this as completed Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarify example 17.4 #244

clarify example 17.4 #244

petrelharp commented Nov 22, 2021

bhaller commented Nov 22, 2021

petrelharp commented Nov 22, 2021

bhaller commented Nov 22, 2021

petrelharp commented Nov 22, 2021

bhaller commented Sep 17, 2023

clarify example 17.4 #244

clarify example 17.4 #244

Comments

petrelharp commented Nov 22, 2021

bhaller commented Nov 22, 2021

petrelharp commented Nov 22, 2021

bhaller commented Nov 22, 2021

petrelharp commented Nov 22, 2021

bhaller commented Sep 17, 2023