Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create teams, datasets, and account aliases when ingesting manifests #41

Merged
merged 23 commits into from
Sep 6, 2024

Conversation

karlhigley
Copy link
Collaborator

No description provided.

@karlhigley karlhigley self-assigned this Sep 3, 2024
@karlhigley karlhigley marked this pull request as ready for review September 3, 2024 19:26
@karlhigley karlhigley force-pushed the karl/feature/team-dataset-aliases branch from 7ee2bf5 to 5331dd0 Compare September 4, 2024 13:17
@karlhigley karlhigley force-pushed the karl/feature/team-dataset-aliases branch 2 times, most recently from 60abe68 to fcb2710 Compare September 4, 2024 13:48
@karlhigley karlhigley force-pushed the karl/feature/team-dataset-aliases branch from fcb2710 to 101882d Compare September 4, 2024 13:51
@karlhigley karlhigley force-pushed the karl/feature/team-dataset-aliases branch 3 times, most recently from 27c7095 to 621dbba Compare September 4, 2024 14:53
@karlhigley karlhigley force-pushed the karl/feature/team-dataset-aliases branch from 621dbba to 8a0c773 Compare September 4, 2024 14:55
@karlhigley karlhigley force-pushed the karl/feature/team-dataset-aliases branch from f8317fd to c7008bf Compare September 4, 2024 15:07
@karlhigley karlhigley requested a review from kluver September 4, 2024 15:50
@karlhigley
Copy link
Collaborator Author

This does create e.g. account aliases, but the methods for fetching them don't exist yet so it doesn't explicitly test that. I'm envisioning that as a separate set of PRs that integrates the new aliases into the recommendation request and dataset export processes.

Copy link
Contributor

@kluver kluver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I want to check, but nothing that looked like a blocker to me.

@@ -55,6 +55,21 @@ def fetch_account_by_email(self, email: str) -> Account | None:
return accounts[0]
return None

def store_account(self, account: Account) -> UUID | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth revising the web code that uses the following store_new_account to use this version instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure, but maybe? I left the other method there so this wouldn't break anything, but I do kinda like having methods that accept domain objects instead of accepting the relevant info and creating domain objects internally



def upgrade() -> None:
op.drop_column("datasets", "dataset_name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem with this change, but it does feel unconnected from the other changes, and I'm not immediately seeing the reasoning here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each experiment is owned by a team and corresponds to a dataset, which (among other things) is how we know which account aliases to use when exporting experiment data. Loading a manifest is currently the only way to create a dataset, and the manifest doesn't have a field for naming the dataset so it's not clear what to put in this column.

An alternative change would be to make this column nullable if we expect other ways of creating datasets that would allow us to provide reasonable names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're running into an overload of the term dataset. My gut says that outside experiment-team-tied datasets we may one-day have "public" datasets.

I don't think that hypothetical is worth keeping a column you're unsure about -- if we hit the hypothetical future where we have "public" datasets, let's solve that problem later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public vs private is an interesting wrinkle too, but I was actually thinking of something a little different:

We've designed the database schema so that every experiment has a dataset, but not all datasets have experiments. That leaves us room to do things like export data for an experimenter so they can check properties of the data in order to find out if our platform makes sense for their experiment (like @sophiasun0515 has wanted to do re: domestic/international news bias.) Exporting data that way should also create a dataset with associated account id aliases but won't involve importing a manifest since it's pre-experiment.

owner = Team(
team_id=manifest.owner.team_id,
team_name=manifest.owner.team_name,
members=manifest.owner.members,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the implication here that the manifest will contain a list of member UUIDs? Do we like this, or should we have the manifest list emails and then look up the UUIDs as part of ingress here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also -- from a type-safety standpoint (which might be uninteresting FWIW) would this be list[str] or list[UUID] at this point? I feel like it would be list[str] but the Team class has list[UUID]?

Not sure we care about that, but it occurred to me that it's worth checking.

Copy link
Collaborator Author

@karlhigley karlhigley Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the implication here that the manifest will contain a list of member UUIDs?

Yes!

Do we like this, or should we have the manifest list emails and then look up the UUIDs as part of ingress here?

No! I do not like this but it allowed me to punt on integrating account lookups, which complicate the picture as far as testing goes.

would this be list[str] or list[UUID] at this point? I feel like it would be list[str] but the Team class has list[UUID]?

It should be list[UUID] but it's actually list[str]. I'll fix this!

Copy link
Collaborator Author

@karlhigley karlhigley Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sit corrected—after checking with a breakpoint it actually is list[UUID] here, so the types are correct. The conversion from string to UUID happens when Pydantic turns the manifest JSON into model objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion from string to UUID happens when Pydantic turns the manifest JSON into model objects.

Neat!



def test_load_manifest():
with open(project_root() / "tests" / "data" / "sample_manifest.toml") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str / str is new syntax to me. Does it just put in OS appropriate slashes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, yes! project_root() returns a pathlib.Path, which provides this syntax and applies the OS-appropriate magic you inferred.

@karlhigley karlhigley force-pushed the karl/feature/team-dataset-aliases branch from ee11f25 to fcbf587 Compare September 6, 2024 13:13
@karlhigley karlhigley merged commit 4589de3 into main Sep 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants