Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract Functioning #190

Merged
merged 26 commits into from
Sep 12, 2024
Merged

Extract Functioning #190

merged 26 commits into from
Sep 12, 2024

Conversation

susan-garry
Copy link
Contributor

Our extract function is passing the tests! To see it in action, try fgfa -I tests/note5.gfa extract -n 2 -c 1, or sub in your own values - the implementation now accepts values of -c greater than 1.

Another change introduced in this PR is that mygfa.Graph.emit and, by extension, slow_odgi norm, behave a bit differently - previously, they were not standardizing the resulting gfa file (as I had assumed they were); Segments, Paths, and Links were grouped together, but the order of their individual elements depended on the order in which they were added; equivalent Links (such as A+ B- and B+ A-) were not recognized. Perhaps I should not be modifying mygfa.Graph.emit directly in order to obtain a standardized graph, since sorting everything takes time, but I'm not sure that's really a concern given that our python implementations were never intended to be fast.

Before merging, it would be nice to figure out an elegant way to automate our testing framework so that we can test a few different values of -c (and, ideally, a few different values of -n).

@sampsyo
Copy link
Collaborator

sampsyo commented Jul 12, 2024

Wahoo!!!! This looks great! Nice work figuring out this tricky bit of the odgi approach!!! And -c > 1 works too—even better!

I suppose we should merge #188 first, since this PR contains those commits; and then it will be easier to see these changes in insolation.

I think you're totally right to make mygfa sort the graph on emission. As you say, this is not supposed to be fast, so no big deal about paying the cost. I suppose one option would be to only do this sorting when we do slow_odgi norm and not in other cases, but that would mean we'd have to normalize after every slow_odgi command, which is also annoying.

One random idea: we could consider also building ourselves a fgfa norm command that does the same kind of sorting, but hopefully faster.

About testing:

Before merging, it would be nice to figure out an elegant way to automate our testing framework so that we can test a few different values of -c (and, ideally, a few different values of -n).

While this is definitely a good idea, TBH, I think this need is starting to highlight the limitations of our snapshot-based approach to testing. Running many different tests using the same input file is… pretty dang awkward in Turnt. So we may want to start thinking about an alternative strategy here… I unfortunately don't have a brilliant idea at the moment, but maybe we can come up with one?

(And in any case, unless there is some low-hanging fruit we can thinking of, maybe it's wisest to merge this version—with the basic tests—and then address the more complex testing approach in a separate PR?)

@sampsyo
Copy link
Collaborator

sampsyo commented Jul 12, 2024

Possibly-useless thought from the commute today: for more complicated testing, we could imagine trying to use FlatGFA's Python bindings, extended to expose ops like extract, and then using a combination of pytest and Hypothesis.

@susan-garry susan-garry merged commit 7e9f620 into main Sep 12, 2024
12 checks passed
@susan-garry susan-garry deleted the extract branch September 12, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants