-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove edge table sorting if possible #401
Comments
Edges need to go into the edge table ordered by parent's birth time (and with edges contiguous by parent). So, to do this we need to do something like:
This is do-able, although the details of the last point aren't clear to me. However, it would totally break if there's something like a "dummy individual" that doesn't ever reproduce or die - at least, it would prevent simplification from ever happening. (And, I think such individuals occur in some simulations?) For the same reason, the interaction between long-lived individuals and simplification interval aren't clear. |
IIRC, Kevin implemented a buffering scheme, but did not have to deal with (a) reading in a tree sequence and having already-alive individuals, or (b) particularly long-lived individuals. |
Kevin points out that the sorting of edges (by child ID and left endpoint) within each contiguous block of edges-from-the-same-parent isn't used by simplify and so might be removed; however, this is orthogonal to the points above (which have to do with sorting-by-parent). |
Great, thanks. Not sure what I think about long-lived individuals; yes, they do occur in some designs (Daphnia resting eggs, etc.). Anyhow, this is not something that I want to attack right now, but perhaps the next time you and I are in the same spacetime vicinity, we might try to do it? Seemed good to have a placeholder to keep it in our thoughts. Marked "long-term" for now. |
This isn't quite right -- overlapping generations is easy to handle, as I've mentioned. There are simple O(N) steps required. The links in the various tskit issues contain the details. Edit: reading in a tree sequence also doesn't change things -- the setup steps related to overlapping generations can be handled right after reading in, etc.. |
Ah, yes - as I think you've noticed, I wasn't saying you didn't have to deal with "having already-alive individuals", I was saying you didn't have to deal with "reading in a tree sequence that referred to already-alive individuals". I guess the "setup steps" you refer to would be taking some of the edges out of the tree sequence and putting them in the buffer; that would require bookeeping but not be too bad. For future reference, the "various tskit issues" are referenced here tskit-dev/tskit#2751 (right?) |
yes, that's the relevant tskit issue. There's another that I need to open re: the wrong edge criteria are being used for validation (full sort output rather than what simplification actually requires). |
@molpopgen has written here https://tskit-dev.slack.com/archives/C01JKJC5Y9G/p1698082329001319?thread_ts=1697956055.069829&cid=C01JKJC5Y9G that:
This would be good to merge into SLiM as well. I gather some work in tskit is needed to enable it (unless we hack SLiM's copy of tskit). I don't know how much work all of this is, as I don't really know what exactly Kevin is doing. But @petrelharp this would be a good thing for us to work on together at some point, I think. Perhaps soon, if you think it would be easy. Getting rid of the edge table sort in SLiM entirely would be very nice – much better than trying to parallelize that sort (which does not have very good performance).
The text was updated successfully, but these errors were encountered: