Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public document groups (Allow one project to be imported to another project) #292

Open
KastanDay opened this issue Aug 21, 2024 · 0 comments
Assignees

Comments

@KastanDay
Copy link
Member

KastanDay commented Aug 21, 2024

Goal: Ability to clone document groups between projects efficiently and stay organized

Edit: Ensure self-hosted version can clone/copy CropWizard (or arbitrary doc groups) into a self-hosted instance.

Definitions

  • Source - the document group being cloned.
  • Destination - the project doing the cloning.
  • e.g. Cropwizard (Source) is cloned into Industry Partner project (Destination).

No matter what, the Destination project will not have "full, unrestricted access to the source files." They will only see them in search results, not available for download. To protect the intellectual property of the source data. This can be configurable with Authorized downloaders of source data.

3 Implementation methods

Deciding between final options:

  1. Copy by reference - receive all upstream changes instantly.
    vs
  2. Full copy, detached from the source. Mutable so it can accept changes. Optimized for no duplicate files.

Cloning options considered

  1. Clone by Pointer/Reference: Clone w/ no downstream control AT ALL. They get whatever the source does. They won't even see the individual docs, just a summary of "imported from CropWizard, 409,000 docs." The Source docs appear in /chat results but not in the /materials table (i.e. they can't export the Cropwizard DB source PDFs with a single click).
  2. Clone by full copy of the data: it's optimized so there's no duplication of files, but at it's core, it just a full copy of everything with complete ownership by the new project.
  3. Github-style forking / merging - my favorite and wouldn't be too crazy to implement in theory. Customer adoption might be a challenge, especially with crummy UI.

Shared Projects Table

  • Source project slug
  • Destination project slug
  • @future Excluded document groups

Functions:

  • Destination projects can CREATE and REMOVE shared projects
  • How do they know what projects can be shared? Table of available ones.

Features:

  • Destination projects can have multiple shared projects. They can delete them, too.
  • They have NO CONTROL over the source project. The destination sees all changes.

UI:

  • Table view with projects that are "available to be imported"
  • Some are "starred" to show up first. Otherwise, they can search for ANY public project (until we have unlisted as a concept...)

Source deletes a doc --- how does the Destination retain that doc?

Source adds a doc --- how do we prevent Destination from getting the new one?


SQL field for private in doc_groups table.
SQL foreign key to doc_groups for subscribed_doc_groups in projects table.

Update Qdrant filtering to allow docGroups.

The search conditions are as follows:
* Main query: (course_name AND doc_groups) OR (public_doc_groups)
* if 'All Documents' enabled, then add filter to exclude disabled_doc_groups

@KastanDay KastanDay converted this from a draft issue Aug 21, 2024
@KastanDay KastanDay moved this to ✅ Done in UIUC.chat Development Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants