-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create teams, datasets, and account aliases when ingesting manifests #41
Conversation
7ee2bf5
to
5331dd0
Compare
60abe68
to
fcb2710
Compare
fcb2710
to
101882d
Compare
27c7095
to
621dbba
Compare
621dbba
to
8a0c773
Compare
f8317fd
to
c7008bf
Compare
This does create e.g. account aliases, but the methods for fetching them don't exist yet so it doesn't explicitly test that. I'm envisioning that as a separate set of PRs that integrates the new aliases into the recommendation request and dataset export processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things I want to check, but nothing that looked like a blocker to me.
@@ -55,6 +55,21 @@ def fetch_account_by_email(self, email: str) -> Account | None: | |||
return accounts[0] | |||
return None | |||
|
|||
def store_account(self, account: Account) -> UUID | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth revising the web
code that uses the following store_new_account
to use this version instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally sure, but maybe? I left the other method there so this wouldn't break anything, but I do kinda like having methods that accept domain objects instead of accepting the relevant info and creating domain objects internally
|
||
|
||
def upgrade() -> None: | ||
op.drop_column("datasets", "dataset_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a problem with this change, but it does feel unconnected from the other changes, and I'm not immediately seeing the reasoning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each experiment is owned by a team and corresponds to a dataset, which (among other things) is how we know which account aliases to use when exporting experiment data. Loading a manifest is currently the only way to create a dataset, and the manifest doesn't have a field for naming the dataset so it's not clear what to put in this column.
An alternative change would be to make this column nullable if we expect other ways of creating datasets that would allow us to provide reasonable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're running into an overload of the term dataset. My gut says that outside experiment-team-tied datasets we may one-day have "public" datasets.
I don't think that hypothetical is worth keeping a column you're unsure about -- if we hit the hypothetical future where we have "public" datasets, let's solve that problem later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Public vs private is an interesting wrinkle too, but I was actually thinking of something a little different:
We've designed the database schema so that every experiment has a dataset, but not all datasets have experiments. That leaves us room to do things like export data for an experimenter so they can check properties of the data in order to find out if our platform makes sense for their experiment (like @sophiasun0515 has wanted to do re: domestic/international news bias.) Exporting data that way should also create a dataset with associated account id aliases but won't involve importing a manifest since it's pre-experiment.
owner = Team( | ||
team_id=manifest.owner.team_id, | ||
team_name=manifest.owner.team_name, | ||
members=manifest.owner.members, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the implication here that the manifest will contain a list of member UUIDs? Do we like this, or should we have the manifest list emails and then look up the UUIDs as part of ingress here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also -- from a type-safety standpoint (which might be uninteresting FWIW) would this be list[str]
or list[UUID]
at this point? I feel like it would be list[str]
but the Team
class has list[UUID]
?
Not sure we care about that, but it occurred to me that it's worth checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the implication here that the manifest will contain a list of member UUIDs?
Yes!
Do we like this, or should we have the manifest list emails and then look up the UUIDs as part of ingress here?
No! I do not like this but it allowed me to punt on integrating account lookups, which complicate the picture as far as testing goes.
would this be list[str] or list[UUID] at this point? I feel like it would be list[str] but the Team class has list[UUID]?
It should be list[UUID]
but it's actually list[str]
. I'll fix this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sit corrected—after checking with a breakpoint it actually is list[UUID]
here, so the types are correct. The conversion from string to UUID happens when Pydantic turns the manifest JSON into model objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion from string to UUID happens when Pydantic turns the manifest JSON into model objects.
Neat!
|
||
|
||
def test_load_manifest(): | ||
with open(project_root() / "tests" / "data" / "sample_manifest.toml") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str / str
is new syntax to me. Does it just put in OS appropriate slashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, yes! project_root()
returns a pathlib.Path
, which provides this syntax and applies the OS-appropriate magic you inferred.
ee11f25
to
fcbf587
Compare
No description provided.