Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: historicized membership data model #4542

Merged
merged 17 commits into from
Apr 3, 2024

Conversation

fontanierh
Copy link
Contributor

@fontanierh fontanierh commented Apr 2, 2024

Description

fixes https://github.com/dust-tt/tasks/issues/548

In #4521, we added startAt and endAt on the memberships table. The idea is to stop using the revoked role, and instead have a history of memberships so we can tell who was a member of what workspace at any given time.

Previous PR added the fields, the backfill script, and the BL to automatically fill startAt on membership creation and endAt on membership revoke. But everywhere in the app, we still use the revoked role instead of checking the endDate, and we still update memberships in-place.

This PR changes the memberships business logic to switch to historicized memberships, removing the need for the revoked status. We now have a MembershipResource that follows the resource pattern and centralizes usage of the memberships table (with one exception, the Poké workspace deletion flow).

Risk

Security critical
Quite risky as it touches the memberships.

Deploy Plan

  • deploy the code
  • run the migration script to clear the revoked status from the DB

Copy link

github-actions bot commented Apr 2, 2024

Warnings
⚠️

Files in **/lib/models/ have been modified and the PR has the migration-ack label. Don't forget to run the migration from the -edge infrastructure.

Generated by 🚫 dangerJS against 7948601

@fontanierh fontanierh changed the title big scary membership refactor feat: historicized membership data model Apr 3, 2024
front/lib/auth.ts Show resolved Hide resolved
front/lib/document_tracker.ts Show resolved Hide resolved
@@ -108,6 +108,22 @@ export class ContentFragmentResource extends BaseResource<ContentFragmentModel>
},
};
}

async update(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the update method from the BaseResource. Ideally, we don't want to have these update methods on resources since it's just exposing the Sequelize API on resources (and we're actively trying to strangle the use of Sequelize API within resources)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the meantime, the implementation was copypastad onto the resource classes that use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok; so afaiu:

  • we always want to update resources;
  • long term, the best is to have a clean (non-sequelized based) base method for update;
  • in the meantime we basically havwe a pass-through sequelize;

So at some point we'll refactor all the updates. In what way is it cleaner to refactor the copy-pasted version in each resource, vs refactor the base_resource version + change the call sites?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at some point we'll refactor all the updates. In what way is it cleaner to refactor the copy-pasted version in each resource, vs refactor the base_resource version + change the call sites?

I don't think we will. Having a generic "update" is not desirable as it is a departure from the resource pattern. What we want is to have resource-specific methods that may perform updates. For example "revoke()" instead of "update({role: 'revoked'}".

}
if (userIds && !userIds.length) return [];

// Get all the memberships matching the criteria.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we may be doing a bit of over-fetching if some members were revoked and re-invited several times. I believe this is fine since:

  • the number of over-fetched stuff will be quite low
  • it is not on a hot path
  • it makes the code / querying a lot simpler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hot path since it's called from getLatestMembershipOfUserInWorkspace which is called from login ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the over-fetch is fine, but noting this is quite hot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you don't get revoked memberships here do you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do get the revoked memberships, it's this method's purpose vs getActiveMemberships.
The overfetching comes from getting the non-latest memberships, which IMO is fine.

front/pages/api/w/[wId]/workspace-usage.ts Show resolved Hide resolved
front/poke/temporal/activities.ts Outdated Show resolved Hide resolved
types/src/front/memberships.ts Show resolved Hide resolved
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of comments.

Need more time to review. Let's try to have WorkspaceType or LightWorkspaceType in all interfaces of ressources 🙏

Will require minimal additional gymnastic but will set us up for a much better world.

front/poke/temporal/activities.ts Outdated Show resolved Hide resolved
front/pages/no-workspace.tsx Show resolved Hide resolved
front/pages/api/w/[wId]/workspace-usage.ts Show resolved Hide resolved
front/lib/resources/membership_resource.ts Outdated Show resolved Hide resolved
@fontanierh
Copy link
Contributor Author

fontanierh commented Apr 3, 2024

@spolu

Need more time to review. Let's try to have WorkspaceType or LightWorkspaceType in all interfaces of ressources 🙏

Will require minimal additional gymnastic but will set us up for a much better world.

This is temporary as we'll have a workspace resource at some point. Once we get there we'll be fine but in the meantime we're going to have to either accept Workspace on just accept to take a ModelId

@spolu
Copy link
Contributor

spolu commented Apr 3, 2024

This is temporary as we'll have a workspace resource at some point. Once we get there we'll be fine but in the meantime we're going to have to either accept Workspace on just accept to take a ModelId

Agreed that we may have a big WorkspaceResource that spans Menbership management. But in the meantime I think it's a low cost change to Accept a LightWorkspaceType which has sId and id in these interface.

We just need to call something like renderLightWorkspaceFromModel(workspace: Workspace) that we can add in types or in lib/api/

@fontanierh fontanierh requested a review from spolu April 3, 2024 12:45
Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey ! First pass, pushing coms now so you can check them, finishing the reveiw meanwhile

@@ -108,6 +108,22 @@ export class ContentFragmentResource extends BaseResource<ContentFragmentModel>
},
};
}

async update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok; so afaiu:

  • we always want to update resources;
  • long term, the best is to have a clean (non-sequelized based) base method for update;
  • in the meantime we basically havwe a pass-through sequelize;

So at some point we'll refactor all the updates. In what way is it cleaner to refactor the copy-pasted version in each resource, vs refactor the base_resource version + change the call sites?

front/lib/resources/membership_resource.ts Outdated Show resolved Hide resolved
);

const where: WhereOptions<InferAttributes<MembershipModel>> = {};
if (roles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wouldn't that work?

const where: WhereOptions<InferAttributes<MembershipModel>> = {
      role: roles,
      userId: userIds ? { [Op.in]: userIds } : undefined,
      workspaceId: workspace ? workspace.id : undefined,
    };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also a similar version for the function above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it might, but I am not convinced that inline ternaries are bringing a lot of extra readability here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I understand;
I advocate for it because it turns 13 lines into 3, and while I agree that ternaries are hard to read in some cases e.g. when nested, flat ternaries are the equivalent of saying "hi" vs "good morning"; legit part of the language (that we commonly use) and we should learn to read them as fluently as as we learned to read the ifs

Clearly a nit, will approve anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright you flipped me :) changing

where.workspaceId = workspace.id;
}

if (!workspace && !userIds?.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not check that also on the function above? or globally via the GetMembershipOptions type (don't know if it is actually possible)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's partially done by the type system, but sadly the type cannot easily check that the array is non-empty (unless you force the caller to always cast the array as non-empty for non-literal cases.

Indeed I should add it in the other method 🙏

if (!workspace && !userIds?.length) {
throw new Error("At least one of workspace or userIds must be provided.");
}
if (userIds && !userIds.length) return [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok. But if we accept userIds = [] with a workspace (and return [] in that case), why not accept it without a workspace and return [] too, and then remove the line above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workspace is either null or non-null, so it's easy to force it via type system

Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thiink we want to have a getUserActiveMembership function to use in places of:

  • getLatestMembershipOfUserInWorkspace (maybe except the membership update endpoint but TBD)
  • getActiveMemberships with [userId] as argument

This will simplify many code places by removing the [0] derefs of getActiveMemberships and more importantly all the places where we check the endDate existence and logic (error prone yet security critical) outside of the Resource.

front/poke/temporal/activities.ts Outdated Show resolved Hide resolved
front/mailing/20240308-weekly-update.ts Show resolved Hide resolved
}

if (!workspace && !userIds?.length) {
throw new Error("At least one of workspace or userIds must be provided.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type system does not catch that given RequireAtLeastOne ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot check for array emptiness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(at least not without making it quite painful on the caller side with a bunch of manual casts)

for (const m of memberships) {
const key = `${m.userId}__${m.workspaceId}`;
const latest = latestMembershipByUserAndWorkspace.get(key);
if (!latest || latest.startAt < m.startAt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prioritize end = null first and startAt second?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically equivalent, since you cannot create a new membership if there is already an overlapping one. If you have an active membership (i.e non-ended), it is necessarily the one with the greatest startAt, so it simplifies the code to only check for startAt.

}
if (userIds && !userIds.length) return [];

// Get all the memberships matching the criteria.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hot path since it's called from getLatestMembershipOfUserInWorkspace which is called from login ?

}
if (userIds && !userIds.length) return [];

// Get all the memberships matching the criteria.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you don't get revoked memberships here do you?

@@ -108,6 +108,22 @@ export class ContentFragmentResource extends BaseResource<ContentFragmentModel>
},
};
}

async update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

userId: user.id,
workspaceId: workspace.id,
},
const memberships = await MembershipResource.getActiveMemberships({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favour of having a getActiveMenbership function to avoid the [0] deref. We'll likely need it in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

userId: user.id,
workspaceId: membershipInvite.workspaceId,
},
const m = await MembershipResource.getLatestMembershipOfUserInWorkspace({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier to have a getActiveUserMenbership function that returns a membership only if it has an active one for the workspace. Will abstract the error prone endAt checks inside the Resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per IRL, we need it to put users in the right flow. Refactored with isRevoked()

@@ -188,7 +187,10 @@ async function handleRegularSignupFlow(
SSOEnforcedError
>
> {
const activeMemberships = await getActiveMembershipsForUser(user.id);
const activeMemberships = await MembershipResource.getActiveMemberships({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also use getActiveUserMenbership kinda weird you use both getActiveMemberships and getLatestMembershipOfUserInWorkspace in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each case is different. Here we get all active memberships of a user (across all workspaces).

@spolu
Copy link
Contributor

spolu commented Apr 3, 2024

Other than the comments above generally LGTM.
Please requalify the Risk section of the pull request as: "Security critical"

Will make another review once finalised given the security criticality

Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another little batch (would say 2/3)

transaction,
});
// Then, we only keep the latest membership for each (user, workspace).
const latestMembershipByUserAndWorkspace = new Map<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to gather what the code did. here we could

return memberships
   .reduce( (arr, v) => if !(arr.find( (a) => a.userId === v.userId && a.workpaceId === a.memberId)) arr.push(v) )
   .map( (resource) => new MembershipResource(MembershipModel, resource.get())

leveraging the fact that membership is already ordered
this would remove the need for orderedResourcesFromModels and for an intermediary map; wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your version is less performant with O(n^2) complexity and IMHO less readable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the perf side, I thought o(n^2) would not be a problem here, might have been a bit fast in that assesment I reckon 👍

On the readability, I'm genuinely surprised, this can make up for a good 80/20 debate :) 🍺

I get that you don't like "reduce", but to the point of creating a custom 6-line function, then 11 lines to turn an array into a map, that you will then turn back into an array?

16 lines (omitting the map line that is common to both solutions), whereas a 1-line reduce that directly says "add to the array if it isn't already there"

Obv will not block approval for this of course, but glad to discuss that on friday 👍

throw new Error(
`Unreachable: Found multiple latest memberships for user ${userId} in workspace ${workspace.id}`
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we may be doing a bit of over-fetching if some members were revoked and re-invited several times. I believe this is fine since:

the number of over-fetched stuff will be quite low
it is not on a hot path
it makes the code / querying a lot simpler

Re this comment above => is this still not on a hot path here? (or expected to be quick anyways)
just checking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a hot path after all with userIds.length === 1 . However the over fetching would be ridiculously small in such cases, and non-existent most of the time.

if (endAt < membership.startAt) {
throw new Error("endAt must be after startAt");
}
if (membership.endAt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so not possible to update the endDate of a membership. Good for me since we probably don't have "future" endDates in code yet, so out of the PR scope (mentioning to confirm it's intentional)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No currently indeed, but would be relatively straight forward to add support for. Wanted to avoid introducing such capability without a clear need for it.

front/lib/resources/membership_resource.ts Show resolved Hide resolved
return this.toJSON();
}

toJSON() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: basically toJSON = the business defined json, and toListJSON = the JSON with all the raw db fields?

=> would go with toRawJSON instead of toListJSON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toJson = SomethingType
toListJson = LightSomethingType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove them actually they don't make a ton of sense for this resource since we never really serialize it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh so wasn't it toLightJSON? and not toListJSON ?
but yeah if we don't use it removing is best

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo culted from other resources, but I agree toLightJson wouldn't be bad

Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, this was a large one, thanks 🙏 Mostly good for me, cf a few coms split on 3 reviews
We'll be better with that, worth it.

workspaceId: workspace.id,
},
const memberships = await MembershipResource.getLatestMemberships({
workspace: workspace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove 1 workspace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up

@@ -25,105 +19,6 @@ const { DUST_DATA_SOURCES_BUCKET = "", SERVICE_ACCOUNT } = process.env;
// `cli` takes an object type and a command as first two arguments and then a list of arguments.
const workspace = async (command: string, args: parseArgs.ParsedArgs) => {
switch (command) {
case "find": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we delete find / add-user? is it sideways from the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of sideways yes. Some of the commands required refactoring for the new setup, and since all of these are dead code I figured I might just delete them so the total net diff is a little bit less green :)

Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

indexes: [
{ fields: ["userId", "role"] },
{ fields: ["startAt"] },
{ fields: ["endAt"] },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the queries we use, shouldn't this be ["endAt", "startAt"] instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right (although in practice it's almost the same).
Will do in separate PR since this one has no migration (just moved this mode)

@fontanierh fontanierh added the migration-ack 📁 Label to acknowledge that a migration is required. label Apr 3, 2024
@fontanierh fontanierh merged commit 2824898 into main Apr 3, 2024
7 checks passed
@fontanierh fontanierh deleted the feat/historize-memberships-step-2 branch April 3, 2024 13:54
flvndvd pushed a commit that referenced this pull request May 26, 2024
* make startAt non-nullable

* make startAt non-nullable

* membership resource

* fix bug

* more checks

* misc BL fixes

* migration to delete the revoked role

* remove comment

* review: use lightWorkspaceType only as interface

* support tx everywhere

* use resource for poke as well

* enh: isRevoked

* active membership of user in ws

* add check to other function

* remove serialization methods

* cleanup

* pr syntax

---------

Co-authored-by: Henry Fontanier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migration-ack 📁 Label to acknowledge that a migration is required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants