Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "as_of" columns to oracle output target data #300

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bsweger
Copy link
Collaborator

@bsweger bsweger commented Jan 29, 2025

Closes #296

Add sequence_as_of and tree_as_of columns to the oracle output target data files. We didn't talk about the latter in the call earlier this week, but if we're adding as a hedge against out-of-order data creation, I think it makes sense to have both.

This PR also fixes a few issues I noticed when making the above updates:

  • get_target_data.py wasn't listed in src/README.md
  • some of the input parameters in get_target_data.py had old help text

Edited to add some terminal output from a local test of the updated file:

In [1]: import polars as pl

In [2]: updated_oracle = pl.read_parquet("~/Downloads/target-data-2024-10-23/oracle-output/nowcast_date=2024-10-23/oracle.parquet")

In [3]: updated_oracle.columns
Out[3]:
['location',
 'target_date',
 'clade',
 'oracle_value',
 'nowcast_date',
 'sequence_as_of',
 'tree_as_of']

In [4]: updated_oracle.head()
Out[4]:
shape: (5, 7)
┌──────────┬─────────────┬───────┬──────────────┬──────────────┬───────────────┬────────────┐
│ location ┆ target_date ┆ clade ┆ oracle_value ┆ nowcast_date ┆ sequence_as_o ┆ tree_as_of │
│ ---      ┆ ---         ┆ ---   ┆ ---          ┆ ---          ┆ f             ┆ ---        │
│ str      ┆ date        ┆ str   ┆ f64          ┆ date         ┆ ---           ┆ date       │
│          ┆             ┆       ┆              ┆              ┆ date          ┆            │
╞══════════╪═════════════╪═══════╪══════════════╪══════════════╪═══════════════╪════════════╡
│ AL       ┆ 2024-09-22  ┆ 24A   ┆ 0.0          ┆ 2024-10-23   ┆ 2025-01-07    ┆ 2024-10-21 │
│ AL       ┆ 2024-09-22  ┆ 24B   ┆ 0.0          ┆ 2024-10-23   ┆ 2025-01-07    ┆ 2024-10-21 │
│ AL       ┆ 2024-09-22  ┆ 24C   ┆ 0.0          ┆ 2024-10-23   ┆ 2025-01-07    ┆ 2024-10-21 │
│ AL       ┆ 2024-09-22  ┆ 24E   ┆ 0.0          ┆ 2024-10-23   ┆ 2025-01-07    ┆ 2024-10-21 │
│ AL       ┆ 2024-09-22  ┆ 24F   ┆ 0.0          ┆ 2024-10-23   ┆ 2025-01-07    ┆ 2024-10-21 │
└──────────┴─────────────┴───────┴──────────────┴──────────────┴───────────────┴────────────┘

Resolves #296

This changeset adds sequence_as_of and tree_as_of columns to the
oracle output target data files.
Ruff has also seen fit to add it's .02 to this commit.
Neglected to do this when getting the script online.
@bsweger bsweger requested a review from elray1 January 30, 2025 14:17
@bsweger
Copy link
Collaborator Author

bsweger commented Jan 31, 2025

@elray1 @nickreich Should we hold off on this PR pending further conversations about finalizing the hubverse target data columns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a sequence_as_of column to oracle output target data
1 participant