You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This structure isn't ideal with the addition of Orgs and given upcoming features such as public chatbots and changing the Ask You Docs functionality to be structured as a personal org.
The semantics of space type (PERSONAL or SHARED) don't hold any longer for determining persistence location
FeatureType to pass a user_id context around, then using that to decide on a persistence location is also not great.
Goals and Requirements
Reduce the risk of org-owned content data (e.g. confidential docs that are indexed and chatted history against them) leaking across org boundaries
Make it easier to migrate an entire org from a multi-tenant instance to a dedicated instance.
Usage data (chat history etc): always strictly scoped to an org and user, hence personal.
Org System data (user_groups, spaces metadata etc.) - can be shared across multiple users but always strictly scoped to an org
Global System data - users and org_members are the only system-wide shared data i.e. accessible across orgs.
Presentation layer and domain concepts (such as features and space types) should not be directly coupled to the persistence layer system logic
Proposal
Structure folders based on a name for the persistence system followed by one or more keys that identify the unique owner of the data. The filename describes the data.
/index/PERSONAL/{user_id}/ --> Same as above. Ask Your Docs changes to be achieved by providing every user a personal org. A space that isn't shared with any other users is private.
Partially implementing as part of #207 as this involves org-scoped data. Partial because migrating existing data structure to the new in deployed systems will not be handled. A new DataScope enum has been introduced with backwards-compatible mappings where needed.
* refactor: store layer implementing DataScope.
- Personas will use the new file structure for org scoped data.
- A step towards the new file structure proposed in RFC #211
* refactor: personas data entity to assistants
* add DAL functions for personas
* build: update Ruff package
* fix: remember and set selected Assistant for "new chat"
* adjust agent run logic
* refactor(chat ui): remove dates
* fix the data location refactor.
* improve the assistant edit UI
- populate the system and user prompt fields with stater.
* refactor: use the LLM setting collection defined on the assistant.
* fix(UI): only show files/knowledge for shared ask
* chat ui: tweak upload file drop zone
* fix(UI): hide ML Engineering section for none admins
Situation
Data persistence on disk isn't consistently separated by scope of ownership.
Current filesystem structure:
/index/PERSONAL/{user_id}/
- index files for Ask Your Docs feature/index/SHARED/{org_id}/{space_id}/
- index files for Spaces/sqlite/PERSONAL/{user_id)/usage.db
- retrieval and LLM request and response data (chat history etc) for all interactions/sqlite/SHARED/system.db
- system data and metadata (orgs, users, user_groups, spaces, and space_groups)/upload/PERSONAL/{user_id}/
- Ask Your Docs feature is hard coded to MANUAL_UPLOAD document. Those files are persisted here./upload/SHARED/{org_id}/{space_id}
- file uploads for any spaces with datasource = MANUAL_UPLOAD are persisted here.Database table to file mapping:
usage.db
:settings
(user scoped),history_{feature_name}
,history_thread_{feature_name}
system.db
:orgs
,org_members
,users
,settings
(none user scoped),space_groups
,space_group_members
,spaces
,space_access
,user_groups
,user_group_memebers
Tables with joins:
orgs
<>org_members
org_members
<>users
spaces
<>space_access
<>users
spaces
<>space_group_members
users
<>user_group_members
This structure isn't ideal with the addition of Orgs and given upcoming features such as public chatbots and changing the Ask You Docs functionality to be structured as a personal org.
Goals and Requirements
Proposal
Structure folders based on a name for the persistence system followed by one or more keys that identify the unique owner of the data. The filename describes the data.
Pattern:
/{persistance_system_name}/{owner_scope_key_1}/../{owner_scope_key_n}/{filename}
Concrete changes:
/index/SHARED/{org_id}/{space_id}/
-->/index/orgs/{org_id}/{space_id}/
/index/PERSONAL/{user_id}/
--> Same as above. Ask Your Docs changes to be achieved by providing every user a personal org. A space that isn't shared with any other users is private./index/THREAD/{org_id}/{space_id}
/ -->/index/personal/{user_id}/{space_id}
/sqlite/PERSONAL/{user_id)/usage.db
-->/sqlite/personal/{user_id)/usage.db
- authenticated user usage/sqlite/SHARED/system.db
-->/sqlite/global/system.db
- global systemnew -->
/sqlite/orgs/{org_id}/system.db
- org systemsettings
org and user scope - both (?) should be stored in the samesettings
table in/sqlite/orgs/{org_id}/system.db
settings
global scope -/sqlite/global/system.db
/upload/SHARED/{org_id}/{space_id}
-->/upload/org/{org_id}/{space_id}
/upload/PERSONAL/{user_id}/
--> same as above because of the changes to Ask Your Docs./upload/THREAD/{org_id}/space_id}
--> /upload/personal/{user_id}/{space_id}New use cases:
/sqlite/personal/anon-{user_id}/usage.db
- anonymous user usage. user_id is a generated guid./sqlite/orgs/{org_id}/experiments/system.db
The text was updated successfully, but these errors were encountered: