-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CoreNodes] Add cleanup script for Google Drive parents #10442
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sorry for not being able to review yesterday
Left a few coms
}) { | ||
// retrieve all data sources for the provider | ||
const dataSources = await DataSourceModel.findAll({ | ||
where: { connectorProvider: "google_drive" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where: { connectorProvider: "google_drive" }, | |
where: { connectorProvider: "google_drive" }, | |
id: { | |
[Op.gt]: nextDataSourceId, | |
}, | |
}); | ||
|
||
for (const dataSource of dataSources) { | ||
if (dataSource.id >= nextDataSourceId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do that in the where clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant to log when skipping a datasource but forgot about it, adding it
Edit: actually I did not forget about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I get that. You use this condition to make the nextDataSourceId param useful
So if you start using it and call your script with nextDataSourceId > 1000, you want 1. to grab all datasources anyways, and 2. to log "SKIP" for all the datasources with an id < nextDataSourceId?
What would be the purpose?
it's not critical to the PR so still going to approve though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the purpose is that it doesn't cost anything and if anything fishy happens on a datasource, I'd grep on the datasource ID and it's more readable to see that it was skipped vs not seeing anything and having to check in the script what it means. In general I'd rather not optimize queries that are ran once and are fast anyway if it allows me to have more easily parsable logs
// For all nodes in the data source (can be big). | ||
let nextId = ""; | ||
|
||
for (;;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: those are hard to read, i suggest you always use a control flow statement expliciting the condition here, or if no condition use forEach / map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can turn this into a do while but since I have to keep the break the condition isn't doing anything so didn't bother with an unused condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i'd have gone with while(nextId) and having nextId = rows[rows.length - 1]?.node_id;
but it's ok not to change here 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that works too!
|
||
logger.info({ nextId, rowCount: rows.length }, "BATCH"); | ||
|
||
if (rows.length === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand it might be a little faster to write, but it's better to optimize for easier to understand :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to keep this break for rows[rows.length - 1]
to not be undefined on the next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily, but fine as is 👍
let newParentId: string | null = null; | ||
try { | ||
const uniqueIds = [ | ||
new Set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no guarantees on resulting order when using set
unless you're really sure, we'd need to have guarantees on parents sorting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought order was preserved when writing this, gonna check again out of curiosity, changing if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlike most languages, js Sets actually preserve order: https://dust.tt/w/0ec9852c2f/assistant/FycTesggB2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for looking
so here there is an assumption that botched parents are still symmetric in the right order--which indeed seems correct to me from afar
IMO for unconventional language features such as this (in the future, not here), might warrant a comment (showing you know or you checked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if the order of the parents was lost at any point we won't recover it here (we would need to run a backfill relying on connectors db)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 👍 left coms, but nothing preventing approval
let newParentId: string | null = null; | ||
try { | ||
const uniqueIds = [ | ||
new Set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for looking
so here there is an assumption that botched parents are still symmetric in the right order--which indeed seems correct to me from afar
IMO for unconventional language features such as this (in the future, not here), might warrant a comment (showing you know or you checked)
// For all nodes in the data source (can be big). | ||
let nextId = ""; | ||
|
||
for (;;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i'd have gone with while(nextId) and having nextId = rows[rows.length - 1]?.node_id;
but it's ok not to change here 👍
|
||
logger.info({ nextId, rowCount: rows.length }, "BATCH"); | ||
|
||
if (rows.length === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily, but fine as is 👍
}); | ||
|
||
for (const dataSource of dataSources) { | ||
if (dataSource.id >= nextDataSourceId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I get that. You use this condition to make the nextDataSourceId param useful
So if you start using it and call your script with nextDataSourceId > 1000, you want 1. to grab all datasources anyways, and 2. to log "SKIP" for all the datasources with an id < nextDataSourceId?
What would be the purpose?
it's not critical to the PR so still going to approve though
* add migration script * handle tables
Description
Tests
Risk
Deploy Plan