-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database action changes. #1062
Database action changes. #1062
Conversation
nupur-khare
commented
Oct 17, 2023
- Separated Metadata from cognition data.
- Validated metadata related to cognition data.
- Added unit and integration test cases.
- Fixed unit and integration test cases.
1. Separated Metadata from cognition data. 2. Validated metadata related to cognition data. 3. Added unit and integration test cases. 4. Fixed unit and integration test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a CognitionDataProcessor in shared and move all related code to that pkg
kairon/api/models.py
Outdated
Utility.retrieve_data(data, metadata_item.dict()) | ||
content_type = values.get("content_type") | ||
if isinstance(data, dict) and content_type != CognitionDataType.json.value: | ||
raise ValueError("data of type dict is required if content type is json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
kairon/api/models.py
Outdated
content_type = values.get("content_type") | ||
if isinstance(data, dict) and content_type != CognitionDataType.json.value: | ||
raise ValueError("data of type dict is required if content type is json") | ||
if (not isinstance(data, dict) and Utility.check_empty_string(data)) or (isinstance(data, dict) and data == {}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not data or (isinstance(data, str) and Utility.check_empty_string(data))
kairon/shared/data/data_objects.py
Outdated
from kairon.shared.utils import Utility | ||
|
||
if isinstance(self.data, dict) and self.content_type != CognitionDataType.json.value: | ||
raise ValidationError("data of type dict is required if content type is json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
kairon/shared/data/data_objects.py
Outdated
|
||
if isinstance(self.data, dict) and self.content_type != CognitionDataType.json.value: | ||
raise ValidationError("data of type dict is required if content type is json") | ||
if (not isinstance(self.data, dict) and Utility.check_empty_string(self.data)) or ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not data or (isinstance(data, str) and Utility.check_empty_string(data))
kairon/shared/data/processor.py
Outdated
:raises: AppException | ||
""" | ||
|
||
collections = self.list_collection(bot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to list_cognition_collections
kairon/shared/data/data_objects.py
Outdated
@@ -895,6 +917,7 @@ class BotSettings(Auditlog): | |||
data_importer_limit_per_day = IntField(default=5) | |||
multilingual_limit_per_day = IntField(default=2) | |||
data_generation_limit_per_day = IntField(default=3) | |||
collection_limit = IntField(default=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cognition_collections_limit = 3
cognition_columns_per_collection_limit = 5
we must have both
kairon/shared/data/processor.py
Outdated
def save_cognition_schema(self, metadata: Dict, user: Text, bot: Text): | ||
column_name = [meta['column_name'] for meta in metadata.get('metadata')] | ||
for column in column_name: | ||
Utility.is_exist(CognitionSchema, bot=bot, metadata__column_name=column, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata__column_name__in=column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add validation for duplicate collection also
kairon/shared/utils.py
Outdated
else: | ||
return data[column_name] | ||
|
||
@staticmethod | ||
def get_embeddings_and_payload_data(data: Any, metadata: Dict): | ||
def find_matching_metadata(data: Any, metadata: List, collection: Text = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a simple mongo query where you find schema using column name
kairon/shared/data/processor.py
Outdated
matched_metadata = Utility.find_matching_metadata(data, metadata, collection) | ||
if not matched_metadata: | ||
raise AppException("Metadata related to payload not found!") | ||
results = [metadata_entry for metadata_dict in matched_metadata for metadata_entry in metadata_dict['metadata']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matched_metadata will not be dict?
@@ -52,10 +55,11 @@ async def train(self, *args, **kwargs) -> Dict: | |||
await self.__create_collection__(collection) | |||
for content in tqdm(collections['content'], desc="Training FAQ"): | |||
if content['content_type'] == CognitionDataType.json.value: | |||
if not content['metadata'] or []: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json will always have metadata
kairon/shared/cognition/processor.py
Outdated
|
||
collections = self.list_cognition_collections(bot) | ||
doc_count = CognitionData.objects( | ||
bot=bot, collection__ne=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why collection__ne=None?
kairon/shared/cognition/processor.py
Outdated
|
||
def save_content(self, content: Text, user: Text, bot: Text, collection: Text = None): | ||
if collection: | ||
if self.is_collection_limit_exceeded(bot, collection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this check here?
we must check if collection is created yet!!
kairon/shared/cognition/processor.py
Outdated
if self.is_collection_limit_exceeded(bot, collection): | ||
raise AppException('Collection limit exceeded!') | ||
bot_settings = MongoProcessor.get_bot_settings(bot=bot, user=user) | ||
if not bot_settings["llm_settings"]['enable_faq']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not the first validation in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validation column data
kairon/shared/cognition/processor.py
Outdated
if len(content.split()) < 10: | ||
raise AppException("Content should contain atleast 10 words.") | ||
|
||
Utility.is_exist(CognitionData, bot=bot, id__ne=content_id, data=content, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if collection exists!
validation column data
kairon/shared/cognition/processor.py
Outdated
exp_message="Column already exists!") | ||
metadata_obj = CognitionSchema(bot=bot, user=user) | ||
metadata_obj.metadata = [ColumnMetadata(**meta) for meta in metadata.get('metadata')] | ||
metadata_obj.collection_name = metadata.get('collection_name', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None??
kairon/shared/cognition/processor.py
Outdated
Utility.is_exist(CognitionSchema, bot=bot, collection_name=metadata.get('collection_name'), | ||
exp_message="Collection already exists!") | ||
column_name = [meta['column_name'] for meta in metadata.get('metadata')] | ||
for column in column_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???????????
kairon/shared/cognition/processor.py
Outdated
|
||
def update_cognition_schema(self, metadata_id: str, metadata: Dict, user: Text, bot: Text): | ||
metadata_items = metadata.get('metadata') | ||
Utility.is_exist(CognitionSchema, bot=bot, id__ne=metadata_id, metadata=metadata_items, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
yield final_data | ||
|
||
def __validate_metadata_and_payload(self, payload, bot: Text): | ||
data = payload.get('data') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
async def save_bot_text( | ||
text: TextData, | ||
@router.get("/text/faq", response_model=Response) | ||
async def get_text( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not already covered by list_cognition_data api?
|
||
@router.get("/text/faq/collection", response_model=Response) | ||
async def list_collection( | ||
current_user: User = Security(Authentication.get_current_user_and_bot, scopes=DESIGNER_ACCESS), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the use when we already have list_cognition_schema ?
raise ValidationError("query type is required") | ||
if not self.value or self.value is None: | ||
raise ValidationError("query value is required") | ||
# class DbOperation(EmbeddedDocument): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
@@ -189,7 +189,7 @@ def validate(self, clean=True): | |||
class DatabaseAction(Auditlog): | |||
name = StringField(required=True) | |||
collection = StringField(required=True) | |||
query = EmbeddedDocumentField(DbOperation, required=True) | |||
query = StringField(required=True, choices=[payload.value for payload in DbActionOperationType]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to query type
""" | ||
|
||
collections = list(CognitionSchema.objects(bot=bot).distinct(field='collection_name')) | ||
if collection not in collections and len(collections) >= BotSettings.objects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use get_bot_settings in mongo_processor
Q(bot=bot)).get() | ||
return matching_metadata | ||
except DoesNotExist as e: | ||
raise AppException("Metadata related to payload not found!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Columns do not exist in the schema
yield final_data | ||
|
||
@staticmethod | ||
def retrieve_data(data: Any, schema: Dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate column values
except DoesNotExist: | ||
raise AppException("Payload does not exists!") | ||
|
||
def list_cognition_data(self, bot: Text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paginate
else: | ||
search_payload, vector_embeddings = Utility.get_embeddings_and_payload_data(content['data'], content['metadata']) | ||
metadata = processor.find_matching_metadata(self.bot, content['data'], content.get('collection')) | ||
search_payload, vector_embeddings = processor.get_embeddings_and_payload_data(content['data'], metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is supposed to be a utility
either move to Utility or create static method within Processor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also rename to retrieve_search_payload_and_embedding_payload()
else: | ||
search_payload, vector_embeddings = Utility.get_embeddings_and_payload_data(content['data'], content['metadata']) | ||
metadata = processor.find_matching_metadata(self.bot, content['data'], content.get('collection')) | ||
search_payload, vector_embeddings = processor.get_embeddings_and_payload_data(content['data'], metadata) | ||
else: | ||
search_payload, vector_embeddings = {'content': content["data"]}, content["data"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename vector_embeddings to embedding_payload