Extract Functioning #190

susan-garry · 2024-07-12T00:18:09Z

Our extract function is passing the tests! To see it in action, try fgfa -I tests/note5.gfa extract -n 2 -c 1, or sub in your own values - the implementation now accepts values of -c greater than 1.

Another change introduced in this PR is that mygfa.Graph.emit and, by extension, slow_odgi norm, behave a bit differently - previously, they were not standardizing the resulting gfa file (as I had assumed they were); Segments, Paths, and Links were grouped together, but the order of their individual elements depended on the order in which they were added; equivalent Links (such as A+ B- and B+ A-) were not recognized. Perhaps I should not be modifying mygfa.Graph.emit directly in order to obtain a standardized graph, since sorting everything takes time, but I'm not sure that's really a concern given that our python implementations were never intended to be fast.

Before merging, it would be nice to figure out an elegant way to automate our testing framework so that we can test a few different values of -c (and, ideally, a few different values of -n).

… can pipeline)

…work

…away spans and ids would be nice, but if we're using ids, we might as well use spans

sampsyo · 2024-07-12T11:55:20Z

Wahoo!!!! This looks great! Nice work figuring out this tricky bit of the odgi approach!!! And -c > 1 works too—even better!

I suppose we should merge #188 first, since this PR contains those commits; and then it will be easier to see these changes in insolation.

I think you're totally right to make mygfa sort the graph on emission. As you say, this is not supposed to be fast, so no big deal about paying the cost. I suppose one option would be to only do this sorting when we do slow_odgi norm and not in other cases, but that would mean we'd have to normalize after every slow_odgi command, which is also annoying.

One random idea: we could consider also building ourselves a fgfa norm command that does the same kind of sorting, but hopefully faster.

About testing:

Before merging, it would be nice to figure out an elegant way to automate our testing framework so that we can test a few different values of -c (and, ideally, a few different values of -n).

While this is definitely a good idea, TBH, I think this need is starting to highlight the limitations of our snapshot-based approach to testing. Running many different tests using the same input file is… pretty dang awkward in Turnt. So we may want to start thinking about an alternative strategy here… I unfortunately don't have a brilliant idea at the moment, but maybe we can come up with one?

(And in any case, unless there is some low-hanging fruit we can thinking of, maybe it's wisest to merge this version—with the basic tests—and then address the more complex testing approach in a separate PR?)

sampsyo · 2024-07-12T19:25:39Z

Possibly-useless thought from the commute today: for more complicated testing, we could imagine trying to use FlatGFA's Python bindings, extended to expose ops like extract, and then using a combination of pytest and Hypothesis.

susan-garry added 25 commits June 10, 2024 16:54

typo

7881627

initial chop implementation

74726d1

test chop, fix off-by-one error

0a34f05

test fgfa depth

633ab10

typo

2407f55

add benchmarking for chop, allow shell commands in config.toml (so we…

2cfa2fc

… can pipeline)

chop now computes new links, but is buggy (as is odgi), testing frame…

1cff638

…work

re-implement chop treating links as bidirectional, tests pass

9ff0d6f

clippy

0609993

turnt error messages are verbose

551d286

flatgfa now requires odgi and slow_odgi

f58b6bd

make fetch-og generates odgi files, which test-flatgfa depends on

0f31491

Get changes to workflow file

7c9cd1b

flatgfa tests display turnt diffs

ec66a32

tests that rely on odgi use odgi files, avoids unnecessary conversions

62de9fc

turnt prints stuff

3f50bcb

use latest version of odgi

7d2cefc

actually run tests, don't just print turnt commands

a9b0032

simplifications, code comments, and better benchmarking

71a7e53

represent the ranges of newly chopped segments as spans; abstracting …

0e19844

…away spans and ids would be nice, but if we're using ids, we might as well use spans

oops

c391fa7

extract for radius (-n) 1, actually normalize mygfa output graphs

ba6f192

full normalize mygfa output

9d1803b

extract tests pass for values of c > 1

a8202db

cleanup

a78c618

Merge branch 'main' into extract

9001544

susan-garry merged commit 7e9f620 into main Sep 12, 2024
12 checks passed

susan-garry deleted the extract branch September 12, 2024 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract Functioning #190

Extract Functioning #190

susan-garry commented Jul 12, 2024

sampsyo commented Jul 12, 2024

sampsyo commented Jul 12, 2024

Extract Functioning #190

Extract Functioning #190

Conversation

susan-garry commented Jul 12, 2024

sampsyo commented Jul 12, 2024

sampsyo commented Jul 12, 2024