-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback Data while file upload #1596
Fallback Data while file upload #1596
Conversation
…a while uploading the files.
WalkthroughThe changes involve modifications to the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
kairon/shared/data/processor.py (3)
572-572
: Add docstring tosave_training_data
method includingfile_upload
parameterConsider adding a docstring to
save_training_data
method to explain its functionality and describe the parameters, including the newfile_upload
parameter.
Line range hint
2135-2139
: Update docstring to includefile_upload
parameterThe
add_or_overwrite_config
method's docstring should be updated to include the newfile_upload
parameter, explaining its purpose and usage.🧰 Tools
🪛 Ruff
2133-2133: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
5406-5406
: Add docstring toadd_default_fallback_config
method includingfile_upload
parameterConsider adding a docstring to
add_default_fallback_config
method to explain its functionality and to describe the parameters, including the newfile_upload
parameter.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
kairon/importer/data_importer.py
(1 hunks)kairon/shared/data/processor.py
(6 hunks)
🔇 Additional comments (4)
kairon/importer/data_importer.py (1)
59-60
: Verify handling of non-file-upload scenarios
The hardcoded file_upload=True
parameter might affect existing non-file-upload scenarios. Let's verify how this parameter is used in the MongoProcessor
.
kairon/shared/data/processor.py (3)
589-589
: LGTM
The addition of the file_upload
parameter in the call to add_or_overwrite_config
ensures consistent propagation of the parameter.
2146-2146
: LGTM
The inclusion of the file_upload
parameter in the call to add_default_fallback_config
correctly passes the parameter for further processing.
5453-5454
: LGTM
The conditional check on file_upload
appropriately determines whether to add default fallback data when not uploading from a file.
self.overwrite, self.files_to_save, | ||
file_upload=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider making file_upload
parameter configurable
The file_upload
parameter is hardcoded to True
, which means all imports will be treated as file uploads. Consider making this configurable through the constructor or method parameter to support different import scenarios.
- file_upload=True)
+ file_upload: bool = True)
And update the constructor:
def __init__(self, path: Text, bot: Text, user: Text, files_to_save: set, save_data: bool = True,
- overwrite: bool = True):
+ overwrite: bool = True, file_upload: bool = True):
"""Initialize data importer"""
self.path = path
self.bot = bot
self.user = user
self.save_data = save_data
self.overwrite = overwrite
self.files_to_save = files_to_save
+ self.file_upload = file_upload
Committable suggestion skipped: line range outside the PR's diff.
…a while uploading the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (14)
tests/testing_data/validator/valid_data/data/nlu.yml (2)
3-7
: Consider expanding greet examples for better intent recognition.While the basic greetings are covered, consider adding more variations to improve the model's robustness:
- Common variations like "hey there", "hi there", "howdy"
- Formal greetings like "good morning", "good afternoon"
- Casual variations like "yo", "hiya"
- intent: greet examples: | - hey - hello - hi + - hey there + - hi there + - good morning + - good afternoon + - howdy
8-13
: Consider adding contextual deny examples.The current deny examples are good but could benefit from more context-specific variations that might occur in real conversations:
- Polite refusals
- Context-specific denials
- intent: deny examples: | - no - never - I don't think so - don't like that + - no thanks + - not really + - I'll pass + - not interestedtests/testing_data/validator/valid_data/domain.yml (3)
1-3
: Consider selective entity storage configurationWhile
store_entities_as_slots: true
provides automatic slot creation, it might create unnecessary slots for all entities. Consider configuring this at the intent level for more precise control.config: - store_entities_as_slots: true + store_entities_as_slots: falseThen specify it for specific intents where needed:
intents: - intent_name: use_entities: true store_entities_as_slots: true
7-11
: Simplify intent configurationThe
use_entities: true
configuration is redundant as it's the default behavior. You can simplify the intent declarations unless you need to explicitly document this behavior.intents: - - greet: - use_entities: true - - deny: - use_entities: true + - greet + - deny
12-20
: Consider consolidating similar responses and adding variations
utter_default
andutter_please_rephrase
serve similar purposes. Consider consolidating them.- Single-response templates might lead to repetitive bot behavior. Consider adding variations.
responses: utter_goodbye: - text: Bye + - text: Goodbye! Have a great day! + - text: See you later! utter_greet: - text: Hey! How are you? + - text: Hello! Nice to meet you! + - text: Hi there! How can I help you today? - utter_default: - - text: Can you rephrase! utter_please_rephrase: - text: I'm sorry, I didn't quite understand that. Could you rephrase? + - text: I didn't catch that. Could you say it differently? + - text: I'm not sure what you mean. Can you rephrase that?tests/testing_data/validator/valid_data/config.yml (3)
12-13
: Consider lowering the FallbackClassifier thresholdThe current threshold of 0.75 is quite high and might result in too many fallbacks to the default response. A threshold between 0.3 and 0.6 is typically more balanced.
- threshold: 0.75 + threshold: 0.5
14-15
: Consider increasing DIETClassifier epochs5 epochs might be insufficient for the DIETClassifier to learn effectively, especially with complex intents. Consider increasing to 100-150 epochs for better model performance.
- epochs: 5 + epochs: 100
21-23
: Increase TEDPolicy epochs for better learningSimilar to the DIETClassifier, 5 epochs for TEDPolicy is likely insufficient. Consider increasing to improve dialogue management.
- epochs: 5 + epochs: 100tests/testing_data/validator/valid_data/actions.yml (1)
25-25
: Add newline at end of fileAdd a newline character at the end of the file to comply with YAML formatting standards.
🧰 Tools
🪛 yamllint
[error] 25-25: no new line character at the end of file
(new-line-at-end-of-file)
tests/testing_data/validator/valid_data/data/rules.yml (1)
7-13
: Consider additional slot validationWhile the rule correctly checks for the location slot, consider adding validation for the slot value itself.
- rule: Only say `hello` if the user provided a location condition: - slot_was_set: - - location: true + - location: {"type": "text", "value": !null} steps: - intent: greet - action: utter_greettests/unit_test/validator/data_importer_test.py (3)
124-134
: Clarify the purpose of this additional testThis test appears to be very similar to
test_import_data
but uses a different test data path. Consider:
- Adding comments to explain how this test case differs from
test_import_data
- Renaming the test to better reflect its specific purpose
- Documenting the key differences in the test data between 'valid' and 'valid_data' directories
212-212
: Consider adding fallback data test casesGiven that this PR focuses on "Fallback Data while file upload", consider adding specific test cases to verify:
- Fallback data creation during file upload
- Validation of fallback configurations
- Edge cases in fallback data handling
Line range hint
1-250
: Enhance test coverage for fallback data functionalityWhile the test file has good coverage for basic import scenarios, it lacks specific tests for the fallback data functionality mentioned in the PR objectives. Consider adding:
- Test cases for fallback data creation during file upload
- Validation of fallback configurations
- Error cases for invalid fallback data
- Integration tests with MongoProcessor's fallback-related methods
This will ensure the new functionality is properly tested and maintained.
tests/unit_test/events/events_test.py (1)
840-840
: Remove unnecessary print statements in unit testsThe
Apply this diff to remove the print statements:
- print(list(mongo_processor.fetch_responses(bot))) - print(mongo_processor.fetch_rule_block_names(bot))Also applies to: 843-843
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
tests/testing_data/validator/valid_data/actions.yml
(1 hunks)tests/testing_data/validator/valid_data/chat_client_config.yml
(1 hunks)tests/testing_data/validator/valid_data/config.yml
(1 hunks)tests/testing_data/validator/valid_data/data/nlu.yml
(1 hunks)tests/testing_data/validator/valid_data/data/rules.yml
(1 hunks)tests/testing_data/validator/valid_data/data/stories.yml
(1 hunks)tests/testing_data/validator/valid_data/domain.yml
(1 hunks)tests/unit_test/events/events_test.py
(4 hunks)tests/unit_test/validator/data_importer_test.py
(4 hunks)
✅ Files skipped from review due to trivial changes (2)
- tests/testing_data/validator/valid_data/chat_client_config.yml
- tests/testing_data/validator/valid_data/data/stories.yml
🧰 Additional context used
🪛 yamllint
tests/testing_data/validator/valid_data/actions.yml
[error] 25-25: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (16)
tests/testing_data/validator/valid_data/data/nlu.yml (2)
1-2
: LGTM! Valid YAML structure and version.
The version "3.1" is correctly specified and the NLU section is properly declared.
1-13
: Verify test coverage for fallback scenarios.
Since this file is in the testing data directory and the PR focuses on fallback data during file upload, we should verify that:
- The test data adequately covers fallback scenarios
- There are sufficient edge cases for testing the fallback behavior
✅ Verification successful
Let me check the content of the rules.yml and config.yml files to verify the fallback configurations.
Fallback scenarios are adequately covered in test data
The test data includes comprehensive fallback configurations and test scenarios:
- NLU fallback is handled via
FallbackClassifier
with a confidence threshold of 0.75 - Core fallback is configured with
RulePolicy
(threshold: 0.3, action: action_small_talk) - A dedicated rule exists for handling low confidence NLU scenarios with
utter_please_rephrase
- Multiple test files contain fallback-related test cases across integration and unit tests
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for related test files and fallback configurations
# Look for related test files
echo "Searching for related test files..."
fd -e yml -e yaml . tests/testing_data/validator/valid_data/data/ -x echo "Found test file: {}"
# Check for fallback configurations
echo "Checking for fallback configurations..."
rg -l "fallback" tests/testing_data/validator/valid_data/
# Look for test cases related to fallback
echo "Searching for fallback-related test cases..."
rg -l "test.*fallback" tests/
Length of output: 1252
Script:
#!/bin/bash
# Check content of fallback-related configuration files
echo "Content of rules.yml:"
cat tests/testing_data/validator/valid_data/data/rules.yml
echo -e "\nContent of config.yml:"
cat tests/testing_data/validator/valid_data/config.yml
# Search for fallback-related content in test files
echo -e "\nFallback-related content in test files:"
rg "fallback" -A 3 -B 3 tests/testing_data/validator/valid_data/
Length of output: 2579
tests/testing_data/validator/valid_data/config.yml (2)
1-2
: LGTM: Basic configuration is properly set
The recipe version and language settings are correctly configured.
24-26
: Verify core_fallback_threshold effectiveness
The core_fallback_threshold of 0.3 is quite low and might trigger the fallback action too frequently. Consider testing with different thresholds to find the optimal balance.
✅ Verification successful
The core_fallback_threshold of 0.3 is consistent with standard configurations
Based on the analysis of the codebase:
- The threshold value of 0.3 is commonly used across multiple configuration files, including templates and use-cases
- Production-ready configurations like
template/config/rasa-default.yml
and various use-case templates also use 0.3 - Some configurations use 0.5 for specific scenarios (like multiflow stories and long-answer cases)
- Only one test configuration uses a higher threshold of 0.75
The current threshold of 0.3 appears to be a well-established value in the codebase and doesn't require adjustment unless specific use-case metrics indicate otherwise.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any custom implementation of action_small_talk
rg -l "action_small_talk" --type python
# Check if there are any existing fallback configurations in other files
rg -l "core_fallback_threshold" --type yaml
Length of output: 1696
Script:
#!/bin/bash
# Let's check the core_fallback_threshold values in these config files
rg "core_fallback_threshold:" -B2 -A2
# Also check for any custom fallback action implementations
rg "action_small_talk" --type-add 'py:*.py' --type py
Length of output: 17439
tests/testing_data/validator/valid_data/actions.yml (1)
1-13
: Add error handling and timeout configuration for HTTP action
The HTTP action configuration could be more robust with the following improvements:
- Add request timeout configuration
- Include error handling for failed requests
- Validate response data before using ${data}
Example configuration structure:
http_action:
- action_name: action_say_hello
content_type: json
headers: []
http_url: https://jsonplaceholder.typicode.com/posts/1
params_list: []
request_method: GET
timeout: 5000 # milliseconds
response:
dispatch: true
dispatch_type: text
evaluation_type: expression
value: ${data}
fallback_value: "Sorry, I couldn't fetch the data"
validation:
schema: ${data != null}
set_slots: []
tests/testing_data/validator/valid_data/data/rules.yml (3)
15-19
: Verify rule precedence with location-based greeting
The rule looks good, but ensure it doesn't conflict with the location-based greeting rule. Consider documenting the expected behavior when both rules could apply.
#!/bin/bash
# Check for any other conversation start rules that might conflict
rg "conversation_start:\s*true" --type yaml
1-5
: LGTM! Verify fallback response exists
The fallback rule implementation follows best practices for handling low confidence scenarios.
21-25
: Consider loop prevention mechanism
The rule correctly uses wait_for_user_input: false
, but ensure the utter_greet
response doesn't trigger conditions that could lead back to this rule.
✅ Verification successful
No loop risk detected in utter_greet responses
After reviewing the utter_greet
responses across the codebase, they only contain simple greeting text responses like "Hey! How are you?" without any elements that could trigger the greeting intent again. The rule's wait_for_user_input: false
setting is safe to use in this context.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check the greet response for potential triggers
rg "utter_greet:" -A 5 "domain.yml"
Length of output: 178
Script:
#!/bin/bash
# Search for domain files with different extensions
fd "domain" --type f
# Search for utter_greet in all files
rg "utter_greet:" -A 5
Length of output: 13162
tests/unit_test/validator/data_importer_test.py (2)
162-162
: Verify changes in multiflow test data
The test now expects 7 responses and 3 rule blocks. This change should be validated against the test data.
#!/bin/bash
# Description: Verify the number of responses and rule blocks in multiflow test data
# Expected: Should find 7 responses and 3 rule blocks
echo "Checking responses and rules in multiflow test data:"
rg "responses:" tests/testing_data/multiflow_stories/valid_with_multiflow/
rg "rules:" tests/testing_data/multiflow_stories/valid_with_multiflow/
Also applies to: 164-164
115-123
: Verify the reduction in rule block count
The test now expects 3 rule blocks instead of the previous 4. This change should be validated against the test data in 'tests/testing_data/validator/valid' to ensure it accurately reflects the intended structure.
✅ Verification successful
Rule block count verification is accurate
The test data in tests/testing_data/validator/valid/data/rules.yml
contains exactly 3 rule blocks:
- "Only say
hello
if the user provided a location" - "Say
hello
when the user starts a conversation with intentgreet
" - "Rule which will not wait for user message once it was applied"
The assertion assert len(processor.fetch_rule_block_names(bot)) == 3
correctly matches the actual number of rules in the test data.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the number of rule blocks in the test data
# Expected: Should find 3 rule blocks in the test data
echo "Checking rule blocks in test data:"
rg -l "rules:" tests/testing_data/validator/valid/
Length of output: 420
Script:
#!/bin/bash
# Let's check the actual content of rules.yml in the test data directory
echo "Content of rules.yml in valid test data:"
cat tests/testing_data/validator/valid/data/rules.yml
# Also check if there are any other YAML files that might contain rules
echo -e "\nSearching for other YAML files that might contain rules:"
fd -e yml -e yaml . tests/testing_data/validator/valid/ --exec cat {}
Length of output: 4377
tests/unit_test/events/events_test.py (6)
259-261
: Assertions added for responses and rule block names
The assertions on lines 259 and 261 correctly verify the expected number of responses and rule block names.
305-305
: Added assertion for rule block names
The assertion on line 305 accurately checks the expected number of rule block names.
341-343
: Assertions for responses and rule block names are appropriate
The added assertions on lines 341 and 343 ensure the correct number of responses and rule block names are present.
839-839
: Assertion for training examples count is correct
The assertion on line 839 correctly verifies the number of training examples fetched.
842-842
: Assertion for zero actions is valid
The assertion on line 842 correctly ensures that no actions are fetched, as expected in this context.
845-881
: New test test_trigger_data_importer_with_valid_data
is well-implemented
The test function added between lines 845 and 881 effectively verifies the data importer with valid data. Good work on improving test coverage.
session_config: | ||
session_expiration_time: 60 | ||
carry_over_slots_to_new_session: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review session expiration time
A 60-second session expiration time is unusually short and might interrupt user conversations. Consider increasing this value to a more reasonable duration (e.g., 30-60 minutes) unless there's a specific requirement for such short sessions.
session_config:
- session_expiration_time: 60
+ session_expiration_time: 1800 # 30 minutes in seconds
carry_over_slots_to_new_session: true
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
session_config: | |
session_expiration_time: 60 | |
carry_over_slots_to_new_session: true | |
session_config: | |
session_expiration_time: 1800 # 30 minutes in seconds | |
carry_over_slots_to_new_session: true |
actions: | ||
- action_say_hello | ||
- action_say_goodbye | ||
- utter_greet | ||
- utter_goodbye |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
🛠️ Refactor suggestion
Custom actions are missing implementations
The custom actions action_say_hello
and action_say_goodbye
are defined in the domain file but their implementations (ActionSayHello
and ActionSayGoodbye
classes) are not found in the codebase. Additionally, the utterance actions are redundant as they are automatically registered from responses.
- Remove
utter_greet
andutter_goodbye
from the actions section - Implement the missing custom actions
ActionSayHello
andActionSayGoodbye
in your actions file - Configure the action server endpoint in
endpoints.yml
by uncommenting theaction_endpoint
section
🔗 Analysis chain
Verify custom action implementations and remove redundant utterance actions
- Utterance actions (
utter_*
) don't need to be listed in the actions section as they're automatically registered from responses. - Custom actions need corresponding implementations.
actions:
- action_say_hello
- action_say_goodbye
- - utter_greet
- - utter_goodbye
Let me try another search strategy to find the action implementations.
Let me check the content of the actions files to specifically look for ActionSayHello and ActionSayGoodbye implementations.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for custom action implementations
echo "Searching for action implementations..."
rg -l "class (ActionSayHello|ActionSayGoodbye)" -t python
# Verify action server configuration
echo "Checking action server configuration..."
fd "endpoints.yml" | xargs cat
Length of output: 3211
Script:
#!/bin/bash
# Search for action implementations using fd and grep
echo "Searching for action implementations..."
fd -e py | xargs grep -l "class Action"
# Also check for any actions.py file
echo -e "\nChecking for actions.py files..."
fd "actions.py"
Length of output: 1361
Script:
#!/bin/bash
# Search for specific action implementations
echo "Searching for ActionSayHello and ActionSayGoodbye..."
rg "class ActionSayHello|class ActionSayGoodbye" -A 5 kairon/chat/actions.py kairon/actions/definitions/*
Length of output: 1062
- name: CountVectorsFeaturizer | ||
- analyzer: char_wb | ||
max_ngram: 4 | ||
min_ngram: 1 | ||
name: CountVectorsFeaturizer | ||
- name: FallbackClassifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate CountVectorsFeaturizer
There appear to be two instances of CountVectorsFeaturizer in the pipeline. This might be unintentional and could impact performance.
Suggested fix:
- name: CountVectorsFeaturizer
- analyzer: char_wb
max_ngram: 4
min_ngram: 1
name: CountVectorsFeaturizer
+ name: CountVectorsFeaturizer
+ analyzer: char_wb
+ max_ngram: 4
+ min_ngram: 1
Committable suggestion skipped: line range outside the PR's diff.
- action_name: action_say_goodbye | ||
content_type: json | ||
headers: [] | ||
http_url: https://jsonplaceholder.typicode.com/posts/1 | ||
params_list: [] | ||
request_method: GET | ||
response: | ||
dispatch: true | ||
dispatch_type: text | ||
evaluation_type: expression | ||
value: ${data} | ||
set_slots: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Reduce configuration duplication using YAML anchors
Both actions have identical configurations except for their names. Consider using YAML anchors and aliases to maintain DRY principles.
http_action:
- action_name: action_say_hello
+ &base_action
content_type: json
headers: []
http_url: https://jsonplaceholder.typicode.com/posts/1
params_list: []
request_method: GET
response:
dispatch: true
dispatch_type: text
evaluation_type: expression
value: ${data}
set_slots: []
- action_name: action_say_goodbye
- content_type: json
- headers: []
- http_url: https://jsonplaceholder.typicode.com/posts/1
- params_list: []
- request_method: GET
- response:
- dispatch: true
- dispatch_type: text
- evaluation_type: expression
- value: ${data}
- set_slots: []
+ action_name: action_say_goodbye
+ <<: *base_action
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 yamllint
[error] 25-25: no new line character at the end of file
(new-line-at-end-of-file)
added code related to handling the extra creation of nlu fallback data while uploading the files.
Summary by CodeRabbit
New Features
Bug Fixes