remove edge table sorting if possible #401

bhaller · 2023-10-23T18:17:22Z

@molpopgen has written here https://tskit-dev.slack.com/archives/C01JKJC5Y9G/p1698082329001319?thread_ts=1697956055.069829&cid=C01JKJC5Y9G that:

Edge sorting is totally unnecessary and is a big performance hit (even using the "bookmark"). fwdpyh11 has used this approach for several years now.

This would be good to merge into SLiM as well. I gather some work in tskit is needed to enable it (unless we hack SLiM's copy of tskit). I don't know how much work all of this is, as I don't really know what exactly Kevin is doing. But @petrelharp this would be a good thing for us to work on together at some point, I think. Perhaps soon, if you think it would be easy. Getting rid of the edge table sort in SLiM entirely would be very nice – much better than trying to parallelize that sort (which does not have very good performance).

petrelharp · 2023-10-25T18:09:56Z

Edges need to go into the edge table ordered by parent's birth time (and with edges contiguous by parent). So, to do this we need to do something like:

buffer each parents' edges together somewhere so then can be written to the table all at once after they die
only write these for a given parent p to the edge table at a point when there are no living individuals who were born earlier than p (since if there are still individuals older than p alive, these could still produce edges that would come earlier than p's edges)
do something special when reading in a tree sequence, since newly produced edges aren't guaranteed to occur later than edges already in the tables
somehow arrange that simplify doesn't remove any individuals (or nodes) whose edges haven't been written to the table yet (since a no-longer-alive parent whose edges aren't yet written won't have any recorded offspring in the table, and so would be simplified away)

This is do-able, although the details of the last point aren't clear to me. However, it would totally break if there's something like a "dummy individual" that doesn't ever reproduce or die - at least, it would prevent simplification from ever happening. (And, I think such individuals occur in some simulations?) For the same reason, the interaction between long-lived individuals and simplification interval aren't clear.

petrelharp · 2023-10-25T18:10:48Z

IIRC, Kevin implemented a buffering scheme, but did not have to deal with (a) reading in a tree sequence and having already-alive individuals, or (b) particularly long-lived individuals.

petrelharp · 2023-10-25T18:55:06Z

Kevin points out that the sorting of edges (by child ID and left endpoint) within each contiguous block of edges-from-the-same-parent isn't used by simplify and so might be removed; however, this is orthogonal to the points above (which have to do with sorting-by-parent).

bhaller · 2023-10-25T20:36:27Z

Great, thanks. Not sure what I think about long-lived individuals; yes, they do occur in some designs (Daphnia resting eggs, etc.). Anyhow, this is not something that I want to attack right now, but perhaps the next time you and I are in the same spacetime vicinity, we might try to do it? Seemed good to have a placeholder to keep it in our thoughts. Marked "long-term" for now.

molpopgen · 2023-10-25T20:48:04Z

IIRC, Kevin implemented a buffering scheme, but did not have to deal with (a) reading in a tree sequence and having already-alive individuals, or (b) particularly long-lived individuals.

This isn't quite right -- overlapping generations is easy to handle, as I've mentioned. There are simple O(N) steps required. The links in the various tskit issues contain the details.

Edit: reading in a tree sequence also doesn't change things -- the setup steps related to overlapping generations can be handled right after reading in, etc..

petrelharp · 2023-10-25T23:49:09Z

Ah, yes - as I think you've noticed, I wasn't saying you didn't have to deal with "having already-alive individuals", I was saying you didn't have to deal with "reading in a tree sequence that referred to already-alive individuals". I guess the "setup steps" you refer to would be taking some of the edges out of the tree sequence and putting them in the buffer; that would require bookeeping but not be too bad.

For future reference, the "various tskit issues" are referenced here tskit-dev/tskit#2751 (right?)

molpopgen · 2023-10-26T03:09:48Z

yes, that's the relevant tskit issue. There's another that I need to open re: the wrong edge criteria are being used for validation (full sort output rather than what simplification actually requires).

bhaller added enhancement trees related to tree-seq, tskit, etc. long-term labels Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove edge table sorting if possible #401

remove edge table sorting if possible #401

bhaller commented Oct 23, 2023

petrelharp commented Oct 25, 2023

petrelharp commented Oct 25, 2023

petrelharp commented Oct 25, 2023

bhaller commented Oct 25, 2023

molpopgen commented Oct 25, 2023 •

edited

Loading

petrelharp commented Oct 25, 2023

molpopgen commented Oct 26, 2023

remove edge table sorting if possible #401

remove edge table sorting if possible #401

Comments

bhaller commented Oct 23, 2023

petrelharp commented Oct 25, 2023

petrelharp commented Oct 25, 2023

petrelharp commented Oct 25, 2023

bhaller commented Oct 25, 2023

molpopgen commented Oct 25, 2023 • edited Loading

petrelharp commented Oct 25, 2023

molpopgen commented Oct 26, 2023

molpopgen commented Oct 25, 2023 •

edited

Loading