Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connectors] Upsert and backfill Confluence spaces as data_source_folders #9402

Merged
merged 5 commits into from
Dec 16, 2024

Conversation

aubin-tchoi
Copy link
Contributor

@aubin-tchoi aubin-tchoi commented Dec 16, 2024

Description

  • Closes #9159;
  • Add API calls whenever a Confluence space is upserted/deleted in connectors, which is when permissions are set.
  • Add a backfill script to backfill existing Confluence spaces.
  • Spaces are the only folders in this connector.
  • Tested locally, I get the rows in data_sources_folders, I also get the nodes in data_sources_nodes with mime_type application/vnd.dust.folder.

Risk

  • Relatively low, not a lot of user-facing impact as of now and easily rollbackable.

Deploy Plan

  • Stop all Confluence connectors.
  • Deploy connectors.
  • Resume all Confluence connectors.
  • Run migration

Comment on lines 371 to 377
await upsertFolderNode({
dataSourceConfig,
folderId: internalId,
parents: [internalId],
title: confluenceSpace?.name ?? confluenceId,
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to make the user wait for this operation to complete?
we could technically do this in the workflow but it leads to higher risks of having the table confluence_spaces and the data_source_folders out of sync IMHO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually rather do it in the activities, that's where we upsert in core

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, doing that 👍

Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left a few coms 👍
Mostly, I don't get how we have 2 levels of hierarchy in our confluence, and how this is handled by the parents array?

import { concurrentExecutor } from "@connectors/lib/async_utils";
import {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: it seems there are multiple levels in confluence hierarchy
cf picture: test > test 123 > some files
Are those subspaces ? are those properly taken into account?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are pages, which are not folders but can be parents to other pages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this break anything regarding kwsearch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per IRL: we're good 👍

Comment on lines 371 to 377
await upsertFolderNode({
dataSourceConfig,
folderId: internalId,
parents: [internalId],
title: confluenceSpace?.name ?? confluenceId,
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually rather do it in the activities, that's where we upsert in core

Comment on lines 358 to 359
await deleteFolderNode({ dataSourceConfig, folderId: internalId });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipperolet would you rather delete from core first and then from connectors or the opposite? no strong opinion here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per IRL: core first 👍

@aubin-tchoi
Copy link
Contributor Author

@philipperolet I moved the calls to front -> core to activities, can you check again please? 🙏

@aubin-tchoi aubin-tchoi merged commit 143e777 into main Dec 16, 2024
4 checks passed
@aubin-tchoi aubin-tchoi deleted the confluence-folders branch December 16, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[KW-Search] Update & backfill folders for confluence
2 participants