Bsweger/more state permutations/37 #45

bsweger · 2024-10-28T14:00:47Z

Fixes #37

Background

Add options for state format when returning filtered sequence metadata: FIPS code, full state name (formerly the only option), or two-letter state abbr (the new default).

The primary impact will be to the unscored-location-dates files in the variant-nowcast-hub. Here's a sample of the updated file created with this branch (via a local run).

target_date,location,count
2024-09-29,AK,0
2024-09-30,AK,0
2024-10-01,AK,0
2024-10-02,AK,0
2024-10-03,AK,0

This changeset doesn't change the existing unscored-location-dates files, which still have full state names (we can add an issue for updating the historical data if needed).

Related updates

Because this change involved updates to the function that filters sequence metadata, the PR also includes an updated docstring and corresponding documentation updates.
As part of the documentation updates, there is some refactoring to incorporate some of the newer cladetime naming conventions and to tidy package organization (while keep backwards compatibility until we incorporate these changes into the variant-nowcast-hub/src scripts)

Notes to reviewer

This one can be reviewed commit by commit
The get_location_date_counts.py and get_clades_to_model.py scripts in variant-nowcast-hub have been tested against this feature branch to that the backwards compatibility is working as expected
Preview of updated docs

This is a potential breaking change to anyone who was relying on full state name in the location field of sequence metadata datasets. The default is now the two letter state abbreviation, which is more commonly used in hub tasks.json files.

Documenting the filter_covid_genome_metadata function made it clear that its name should better align with the CladeTime vernacular ('sequence metadata'). Additionally, it makes more sense to move sequence.py out of the utility folder, since sequence-related functions are integral to the package. util/sequence.py remains in the code base for now, so we can import the filtering function with its old name for backward compatibility.

bsweger · 2024-10-28T14:09:37Z

src/cladetime/_typing.py

Renamed this file to something more consistent with it's new use (previously, it contained some custom types for mypy, which never actually worked as intended)

bsweger · 2024-10-28T14:11:16Z

src/cladetime/sequence.py

From a usability perspective, it makes sense to treat "sequence" as a first-class cladetime citizen, rather than something buried in a util folder, so moved it.

The only differences are to the "filter metadata" function:

function is renamed from filter_covid_genome_metadata to filter_sequence_metadata for consistent terminology with CladeTime object

added parameter for state format

added docstring

bsweger · 2024-10-28T14:16:13Z

src/cladetime/util/sequence.py

-    df_assignments = df_assignments.insert_column(1, seq)  # type: ignore
-
-    return df_assignments
+# For temporary backwards compatibility


Ensure existing util.sequence imports continue to work

bsweger · 2024-10-28T14:16:49Z

tests/unit/util/test_sequence.py

@@ -78,7 +77,7 @@ def test_filter_covid_genome_metadata():
            "Homo sapiens",
        ],
        "country": ["USA", "Argentina", "USA", "USA", "USA", "USA", "USA"],
-        "division": ["Alaska", "Maine", "Guam", "Puerto Rico", "Utah", "Pennsylvania", "Pennsylvania"],
+        "division": ["Alaska", "Maine", "Guam", "Puerto Rico", "Utah", "Washington DC", "Pennsylvania"],


The us package has some special "Washington DC" handling, so make sure we're testing for that

elray1

Looks good overall; I had some questions

elray1 · 2024-10-28T14:29:23Z

src/cladetime/util/sequence.py

    """Apply a standard set of filters to the GenBank genome metadata."""

    # Default columns to include in the filtered metadata
-    if len(cols) == 0:
+    if not cols:


Why not if cols is None:?

Why not if cols is None:?

Laziness that resulted in a frivolous falsy? (you're right, and I pushed a fix)

elray1 · 2024-10-28T14:39:49Z

src/cladetime/util/sequence.py


+    A helper function to apply commonly-used filters to a Polars LazyFrame


Q: This will still work if I call it with a pl.DataFrame instead of a pl.DataFrame, right?

Yes, good call--just pushed an update to include DataFrame in the docstring

src/cladetime/assign_clades.py

This function can also accept and return a polars DataFrame (in addition to a LazyFrame)

elray1

approved

elray1

even more approved

bsweger added 6 commits October 28, 2024 09:12

Fix a rookie mistake

5dc10a9

Rename typing

8774078

Update documentation

d42d58d

fix up some merge conflicts

520add4

bsweger force-pushed the bsweger/more-state-permutations/37 branch from 38a328a to 520add4 Compare October 28, 2024 14:03

bsweger commented Oct 28, 2024

View reviewed changes

bsweger requested a review from elray1 October 28, 2024 14:17

elray1 reviewed Oct 28, 2024

View reviewed changes

Update filter_sequence_metadata docstring

2d7271b

This function can also accept and return a polars DataFrame (in addition to a LazyFrame)

elray1 previously approved these changes Oct 28, 2024

View reviewed changes

Falsy fix

87c06aa

bsweger dismissed elray1’s stale review via 87c06aa October 28, 2024 15:39

elray1 approved these changes Oct 28, 2024

View reviewed changes

bsweger merged commit 0290dd9 into main Oct 28, 2024
2 checks passed

bsweger deleted the bsweger/more-state-permutations/37 branch October 28, 2024 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bsweger/more state permutations/37 #45

Bsweger/more state permutations/37 #45

bsweger commented Oct 28, 2024 •

edited

Loading

bsweger Oct 28, 2024

bsweger Oct 28, 2024 •

edited

Loading

bsweger Oct 28, 2024

bsweger Oct 28, 2024

elray1 left a comment

elray1 Oct 28, 2024

bsweger Oct 28, 2024

elray1 Oct 28, 2024

bsweger Oct 28, 2024

elray1 left a comment

elray1 left a comment


		A helper function to apply commonly-used filters to a Polars LazyFrame

Bsweger/more state permutations/37 #45

Bsweger/more state permutations/37 #45

Conversation

bsweger commented Oct 28, 2024 • edited Loading

Background

Related updates

Notes to reviewer

bsweger Oct 28, 2024

Choose a reason for hiding this comment

bsweger Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

bsweger Oct 28, 2024

Choose a reason for hiding this comment

bsweger Oct 28, 2024

Choose a reason for hiding this comment

elray1 left a comment

Choose a reason for hiding this comment

elray1 Oct 28, 2024

Choose a reason for hiding this comment

bsweger Oct 28, 2024

Choose a reason for hiding this comment

elray1 Oct 28, 2024

Choose a reason for hiding this comment

bsweger Oct 28, 2024

Choose a reason for hiding this comment

elray1 left a comment

Choose a reason for hiding this comment

elray1 left a comment

Choose a reason for hiding this comment

bsweger commented Oct 28, 2024 •

edited

Loading

bsweger Oct 28, 2024 •

edited

Loading