-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract Functioning #190
Extract Functioning #190
Conversation
…away spans and ids would be nice, but if we're using ids, we might as well use spans
Wahoo!!!! This looks great! Nice work figuring out this tricky bit of the odgi approach!!! And I suppose we should merge #188 first, since this PR contains those commits; and then it will be easier to see these changes in insolation. I think you're totally right to make mygfa sort the graph on emission. As you say, this is not supposed to be fast, so no big deal about paying the cost. I suppose one option would be to only do this sorting when we do One random idea: we could consider also building ourselves a About testing:
While this is definitely a good idea, TBH, I think this need is starting to highlight the limitations of our snapshot-based approach to testing. Running many different tests using the same input file is… pretty dang awkward in Turnt. So we may want to start thinking about an alternative strategy here… I unfortunately don't have a brilliant idea at the moment, but maybe we can come up with one? (And in any case, unless there is some low-hanging fruit we can thinking of, maybe it's wisest to merge this version—with the basic tests—and then address the more complex testing approach in a separate PR?) |
Possibly-useless thought from the commute today: for more complicated testing, we could imagine trying to use FlatGFA's Python bindings, extended to expose ops like |
Our extract function is passing the tests! To see it in action, try
fgfa -I tests/note5.gfa extract -n 2 -c 1
, or sub in your own values - the implementation now accepts values of-c
greater than 1.Another change introduced in this PR is that
mygfa.Graph.emit
and, by extension,slow_odgi norm
, behave a bit differently - previously, they were not standardizing the resultinggfa
file (as I had assumed they were);Segments
,Paths
, andLinks
were grouped together, but the order of their individual elements depended on the order in which they were added; equivalentLinks
(such asA+ B-
andB+ A-
) were not recognized. Perhaps I should not be modifyingmygfa.Graph.emit
directly in order to obtain a standardized graph, since sorting everything takes time, but I'm not sure that's really a concern given that our python implementations were never intended to be fast.Before merging, it would be nice to figure out an elegant way to automate our testing framework so that we can test a few different values of
-c
(and, ideally, a few different values of-n
).