Database action changes. #1062

nupur-khare · 2023-10-17T18:29:52Z

Separated Metadata from cognition data.
Validated metadata related to cognition data.
Added unit and integration test cases.
Fixed unit and integration test cases.

1. Separated Metadata from cognition data. 2. Validated metadata related to cognition data. 3. Added unit and integration test cases. 4. Fixed unit and integration test cases.

udit-pandey

create a CognitionDataProcessor in shared and move all related code to that pkg

udit-pandey · 2023-10-18T03:01:02Z

kairon/api/models.py

-                Utility.retrieve_data(data, metadata_item.dict())
+        content_type = values.get("content_type")
+        if isinstance(data, dict) and content_type != CognitionDataType.json.value:
+            raise ValueError("data of type dict is required if content type is json")


udit-pandey · 2023-10-18T03:03:27Z

kairon/api/models.py

+        content_type = values.get("content_type")
+        if isinstance(data, dict) and content_type != CognitionDataType.json.value:
+            raise ValueError("data of type dict is required if content type is json")
+        if (not isinstance(data, dict) and Utility.check_empty_string(data)) or (isinstance(data, dict) and data == {}):


if not data or (isinstance(data, str) and Utility.check_empty_string(data))

udit-pandey · 2023-10-18T03:06:54Z

kairon/shared/data/data_objects.py

+        from kairon.shared.utils import Utility
+
+        if isinstance(self.data, dict) and self.content_type != CognitionDataType.json.value:
+            raise ValidationError("data of type dict is required if content type is json")


udit-pandey · 2023-10-18T03:07:03Z

kairon/shared/data/data_objects.py

+
+        if isinstance(self.data, dict) and self.content_type != CognitionDataType.json.value:
+            raise ValidationError("data of type dict is required if content type is json")
+        if (not isinstance(self.data, dict) and Utility.check_empty_string(self.data)) or (


if not data or (isinstance(data, str) and Utility.check_empty_string(data))

udit-pandey · 2023-10-18T03:08:25Z

kairon/shared/data/processor.py

+        :raises: AppException
+        """
+
+        collections = self.list_collection(bot)


rename to list_cognition_collections

udit-pandey · 2023-10-18T03:12:53Z

kairon/shared/data/data_objects.py

@@ -895,6 +917,7 @@ class BotSettings(Auditlog):
    data_importer_limit_per_day = IntField(default=5)
    multilingual_limit_per_day = IntField(default=2)
    data_generation_limit_per_day = IntField(default=3)
+    collection_limit = IntField(default=5)


cognition_collections_limit = 3
cognition_columns_per_collection_limit = 5
we must have both

udit-pandey · 2023-10-18T03:14:44Z

kairon/shared/data/processor.py

+    def save_cognition_schema(self, metadata: Dict, user: Text, bot: Text):
+        column_name = [meta['column_name'] for meta in metadata.get('metadata')]
+        for column in column_name:
+            Utility.is_exist(CognitionSchema, bot=bot, metadata__column_name=column,


metadata__column_name__in=column

add validation for duplicate collection also

udit-pandey · 2023-10-18T03:27:19Z

kairon/shared/utils.py

            else:
                return data[column_name]

    @staticmethod
-    def get_embeddings_and_payload_data(data: Any, metadata: Dict):
+    def find_matching_metadata(data: Any, metadata: List, collection: Text = None):


this can be a simple mongo query where you find schema using column name

udit-pandey · 2023-10-18T03:28:08Z

kairon/shared/data/processor.py

+        matched_metadata = Utility.find_matching_metadata(data, metadata, collection)
+        if not matched_metadata:
+            raise AppException("Metadata related to payload not found!")
+        results = [metadata_entry for metadata_dict in matched_metadata for metadata_entry in metadata_dict['metadata']]


matched_metadata will not be dict?

udit-pandey · 2023-10-18T03:30:42Z

kairon/shared/llm/gpt3.py

@@ -52,10 +55,11 @@ async def train(self, *args, **kwargs) -> Dict:
            await self.__create_collection__(collection)
            for content in tqdm(collections['content'], desc="Training FAQ"):
                if content['content_type'] == CognitionDataType.json.value:
-                    if not content['metadata'] or []:


json will always have metadata

udit-pandey · 2023-10-18T14:32:33Z

kairon/shared/cognition/processor.py

+
+        collections = self.list_cognition_collections(bot)
+        doc_count = CognitionData.objects(
+            bot=bot, collection__ne=None,


why collection__ne=None?

udit-pandey · 2023-10-18T14:34:43Z

kairon/shared/cognition/processor.py

+
+    def save_content(self, content: Text, user: Text, bot: Text, collection: Text = None):
+        if collection:
+            if self.is_collection_limit_exceeded(bot, collection):


why is this check here?
we must check if collection is created yet!!

udit-pandey · 2023-10-18T14:35:15Z

kairon/shared/cognition/processor.py

+            if self.is_collection_limit_exceeded(bot, collection):
+                raise AppException('Collection limit exceeded!')
+        bot_settings = MongoProcessor.get_bot_settings(bot=bot, user=user)
+        if not bot_settings["llm_settings"]['enable_faq']:


why is this not the first validation in this method?

validation column data

udit-pandey · 2023-10-18T14:36:32Z

kairon/shared/cognition/processor.py

+        if len(content.split()) < 10:
+            raise AppException("Content should contain atleast 10 words.")
+
+        Utility.is_exist(CognitionData, bot=bot, id__ne=content_id, data=content,


check if collection exists!
validation column data

udit-pandey · 2023-10-18T14:38:27Z

kairon/shared/cognition/processor.py

+                             exp_message="Column already exists!")
+        metadata_obj = CognitionSchema(bot=bot, user=user)
+        metadata_obj.metadata = [ColumnMetadata(**meta) for meta in metadata.get('metadata')]
+        metadata_obj.collection_name = metadata.get('collection_name', None)


udit-pandey · 2023-10-18T14:41:30Z

kairon/shared/cognition/processor.py

+        Utility.is_exist(CognitionSchema, bot=bot, collection_name=metadata.get('collection_name'),
+                         exp_message="Collection already exists!")
+        column_name = [meta['column_name'] for meta in metadata.get('metadata')]
+        for column in column_name:


???????????

udit-pandey · 2023-10-18T14:44:09Z

kairon/shared/cognition/processor.py

+
+    def update_cognition_schema(self, metadata_id: str, metadata: Dict, user: Text, bot: Text):
+        metadata_items = metadata.get('metadata')
+        Utility.is_exist(CognitionSchema, bot=bot, id__ne=metadata_id, metadata=metadata_items,


udit-pandey · 2023-10-18T14:50:06Z

kairon/shared/cognition/processor.py

+            yield final_data
+
+    def __validate_metadata_and_payload(self, payload, bot: Text):
+        data = payload.get('data')


udit-pandey · 2023-10-23T16:07:38Z

kairon/api/app/routers/bot/data.py

-async def save_bot_text(
-        text: TextData,
+@router.get("/text/faq", response_model=Response)
+async def get_text(


is this not already covered by list_cognition_data api?

udit-pandey · 2023-10-23T16:09:09Z

kairon/api/app/routers/bot/data.py

+
+@router.get("/text/faq/collection", response_model=Response)
+async def list_collection(
+        current_user: User = Security(Authentication.get_current_user_and_bot, scopes=DESIGNER_ACCESS),


whats the use when we already have list_cognition_schema ?

udit-pandey · 2023-10-23T16:13:16Z

kairon/shared/actions/data_objects.py

-            raise ValidationError("query type is required")
-        if not self.value or self.value is None:
-            raise ValidationError("query value is required")
+# class DbOperation(EmbeddedDocument):


udit-pandey · 2023-10-23T16:13:59Z

kairon/shared/actions/data_objects.py

@@ -189,7 +189,7 @@ def validate(self, clean=True):
 class DatabaseAction(Auditlog):
    name = StringField(required=True)
    collection = StringField(required=True)
-    query = EmbeddedDocumentField(DbOperation, required=True)
+    query = StringField(required=True, choices=[payload.value for payload in DbActionOperationType])


rename to query type

udit-pandey · 2023-10-23T16:22:00Z

kairon/shared/cognition/processor.py

+        """
+
+        collections = list(CognitionSchema.objects(bot=bot).distinct(field='collection_name'))
+        if collection not in collections and len(collections) >= BotSettings.objects(


use get_bot_settings in mongo_processor

udit-pandey · 2023-10-23T16:46:56Z

kairon/shared/cognition/processor.py

+                                                        Q(bot=bot)).get()
+            return matching_metadata
+        except DoesNotExist as e:
+            raise AppException("Metadata related to payload not found!")


Columns do not exist in the schema

udit-pandey · 2023-10-23T16:48:38Z

kairon/shared/cognition/processor.py

+            yield final_data
+
+    @staticmethod
+    def retrieve_data(data: Any, schema: Dict):


validate column values

udit-pandey · 2023-10-23T16:52:20Z

kairon/shared/cognition/processor.py

+        except DoesNotExist:
+            raise AppException("Payload does not exists!")
+
+    def list_cognition_data(self, bot: Text):


udit-pandey · 2023-10-23T16:54:14Z

kairon/shared/llm/gpt3.py

-                    else:
-                        search_payload, vector_embeddings = Utility.get_embeddings_and_payload_data(content['data'], content['metadata'])
+                    metadata = processor.find_matching_metadata(self.bot, content['data'], content.get('collection'))
+                    search_payload, vector_embeddings = processor.get_embeddings_and_payload_data(content['data'], metadata)


this is supposed to be a utility
either move to Utility or create static method within Processor

also rename to retrieve_search_payload_and_embedding_payload()

udit-pandey · 2023-10-23T16:56:22Z

kairon/shared/llm/gpt3.py

-                    else:
-                        search_payload, vector_embeddings = Utility.get_embeddings_and_payload_data(content['data'], content['metadata'])
+                    metadata = processor.find_matching_metadata(self.bot, content['data'], content.get('collection'))
+                    search_payload, vector_embeddings = processor.get_embeddings_and_payload_data(content['data'], metadata)
                else:
                    search_payload, vector_embeddings = {'content': content["data"]}, content["data"]


rename vector_embeddings to embedding_payload

Database action changes.

2b712b8

1. Separated Metadata from cognition data. 2. Validated metadata related to cognition data. 3. Added unit and integration test cases. 4. Fixed unit and integration test cases.

udit-pandey reviewed Oct 18, 2023

View reviewed changes

Nupur Khare added 3 commits October 18, 2023 17:19

Fixed requested changes.

d153b03

Fixed requested changes.

875f6b0

Fixed requested changes.

5af8da9

udit-pandey suggested changes Oct 18, 2023

View reviewed changes

Nupur Khare added 4 commits October 20, 2023 16:22

Fixed requested changes.

430375c

Fixed requested changes.

e85c2d8

Fixed requested changes.

9e4372f

Fixed requested changes.

e8e057a

udit-pandey suggested changes Oct 23, 2023

View reviewed changes

nupur-khare closed this Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database action changes. #1062

Database action changes. #1062

nupur-khare commented Oct 17, 2023

udit-pandey left a comment

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 18, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

udit-pandey Oct 23, 2023

Database action changes. #1062

Database action changes. #1062

Conversation

nupur-khare commented Oct 17, 2023

udit-pandey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment