Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Update DictionaryReader::get_entry_matching_value to handle case-insensitive searches (fixes #648). #690

Merged
merged 12 commits into from
Feb 27, 2025

Conversation

aestriplex
Copy link

@aestriplex aestriplex commented Jan 23, 2025

This pull request implements a fix for the issue #648

Here's a short recap of the main changes:

  1. get_entry_matching_value now returns a vector
  2. encode_and_search_dictionary adds imprecise dict var when there are more matching entries

There are four commits because I worked on it taking advantage of @haiqi96's advices (see the discussion on #648). Should I squash them?

Summary by CodeRabbit

  • New Features

    • Enhanced dictionary search functionality to support multiple matching entries.
    • Improved variable matching with more flexible search capabilities.
  • Bug Fixes

    • Updated error handling for empty variable strings.
    • Refined dictionary entry retrieval process.
  • Tests

    • Added comprehensive test cases for multiple matching dictionary entries.
    • Verified case-sensitive and case-insensitive search behaviours.

The changes improve the system's ability to handle complex dictionary searches, allowing for more nuanced and precise variable matching across different components.

Copy link
Contributor

coderabbitai bot commented Jan 23, 2025

Walkthrough

The pull request introduces significant modifications to the get_entry_matching_value method across multiple files in the CLP project. The primary change involves updating the method's return type from a single pointer to a vector of pointers, enabling the retrieval of multiple dictionary entries. This modification allows for more flexible matching, particularly in case-insensitive searches where multiple entries might match the input string. The changes are consistent across different components of the project, including dictionary readers, variable interpreters, and search output processing.

Changes

File Change Summary
components/core/src/clp/DictionaryReader.hpp Updated get_entry_matching_value to return std::vector<EntryType const*> instead of EntryType const*.
components/core/src/clp/EncodedVariableInterpreter.cpp Modified logic to handle multiple dictionary entries during encoding and searching.
components/core/src/clp_s/DictionaryReader.hpp Similar update to return type for get_entry_matching_value method.
components/core/src/clp_s/search/Output.cpp Updated to process multiple variable dictionary entries.
components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp Enhanced handling of multiple dictionary entries.
components/core/tests/test-EncodedVariableInterpreter.cpp Added test section for multiple matching values.

Possibly related PRs

  • feat(clp-s): Add end to end search tests for clp-s. #668: The changes in the main PR regarding the get_entry_matching_value method in DictionaryReader.hpp are directly related to the modifications in the populate_string_queries method in Output.cpp, as both involve handling multiple entries returned from the dictionary.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0127c4f and e2367ce.

📒 Files selected for processing (1)
  • components/core/tests/test-EncodedVariableInterpreter.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/tests/test-EncodedVariableInterpreter.cpp

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (3)

39-39: Consider adding more context or logs upon returning false
When the dictionary returns no entries, returning false implies no match. For traceability and debugging, consider logging a brief message indicating the unmatched variable string.


46-51: Avoid shadowing and preserve clarity
The variable encoded_var declared inside this block may overshadow the encoded_var declared above. To improve clarity and maintain scoping hygiene, consider renaming or reusing the existing variable.


53-61: Confirm efficiency for large sets of matches
Using std::unordered_set for both the IDs and the entries is a good approach for preventing duplicates. If the dictionary can grow large, ensure that insertion and iteration remain performant.

components/core/src/clp_s/DictionaryReader.hpp (1)

159-181: Evaluate case-sensitive early exit logic
Early exit in the case-sensitive branch returns after the first match. This design is acceptable if the business logic requires only one match; otherwise, you may want to continue searching for potential duplicates. Also, consider that repeatedly converting each dictionary entry’s value to uppercase in the else-branch may affect performance on large dictionaries.

components/core/src/clp/DictionaryReader.hpp (1)

236-258: Ensure consistency in case handling
The code pattern for case-sensitive versus case-insensitive matches is consistent. Similar to the other DictionaryReader, consider potential performance costs of repeatedly calling boost::algorithm::to_upper_copy() on each iteration.

components/core/tests/test-EncodedVariableInterpreter.cpp (2)

379-379: Correct typographical error in test name
The word “metching” should be “matching.”

-    SECTION("Test multiple metching values") {
+    SECTION("Test multiple matching values") {

382-405: Comprehensive case-insensitive verification
By storing multiple string variants and confirming that only the case-insensitive retrieval returns all of them, you have validated the updated dictionary logic thoroughly. Consider verifying each returned entry’s content for added rigour.

components/core/src/clp_s/search/Output.cpp (2)

935-935: Consider checking if the returned vector is empty before use.
While it's not strictly necessary, confirming that entries is non-empty (or otherwise handling the empty case) can improve clarity and prevent potential unexpected behaviour in future modifications.


940-940: Reserve space in the set before inserting to avoid repeated rehashing.
Since you already know the approximate number of elements from the size of entries, reserving that capacity in matching_vars (e.g., matching_vars.reserve(entries.size())) can marginally improve performance and reduce memory overhead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66067d6 and ea41a9c.

📒 Files selected for processing (6)
  • components/core/src/clp/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/src/clp_s/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp_s/search/Output.cpp (1 hunks)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/tests/test-EncodedVariableInterpreter.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/tests/test-EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/DictionaryReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/DictionaryReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/search/Output.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (9)
components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (2)

37-38: Ensure consistent handling of multiple matched entries
Changing the function call to return a vector works well for retrieving multiple entries, thereby supporting the new requirement for imprecise matching. This approach is clear and logically sound.


45-45: Verify classification before calling add_non_double_var
While this is logically consistent with the dictionary path, verify that treating all dictionary-based entries as non_double is appropriate in all cases. If some entries are numeric or require special handling, additional checks may be needed.

components/core/src/clp_s/DictionaryReader.hpp (1)

67-69: Accurate docstring update
The revised docstring correctly states that the method returns a vector of matching entries, which may be empty.

components/core/src/clp/DictionaryReader.hpp (1)

85-87: Docstring alignment with new return type
The documentation now accurately states that multiple matching entries can be returned.

components/core/src/clp/EncodedVariableInterpreter.cpp (4)

389-390: Vector-based retrieval
Switching to retrieving all matching entries in a vector offers more flexibility for handling multiple matches. This aligns well with your updated dictionary logic.


391-394: Early return on empty matches
Returning false when no dictionary entries are found is a clear approach. Ensure callers handle this outcome properly.


398-403: Single match flow
Dispatching a single dictionary entry and returning immediately keeps the logic straightforward.


404-413: Handling multiple matches
Encoding and storing all dictionary entries in an imprecise variable set is a practical approach for ambiguous matches. No issues here.

components/core/tests/test-EncodedVariableInterpreter.cpp (1)

409-414: Prudent test clean-up
Deleting the dictionary files at the end of the test prevents workspace pollution.

@@ -82,9 +82,9 @@ class DictionaryReader {
* Gets the entry exactly matching the given search string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should update the docstring.

Suggested change
* Gets the entry exactly matching the given search string
* Gets the entry matching the given search string

@kirkrodrigues any suggestions on how to account for multiple possible matching? "entry(entries)"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets the entry exactly matching the given search string
* Gets the entries matching the given search string

Let's use Haiqi's suggestion.

@haiqi96
Copy link
Contributor

haiqi96 commented Jan 24, 2025

Thanks for opening the PR.
When the PR is approved, you will need to squash the changes so it looks like a single commit. Note that you don't need to squash it manaully, there is a github option to allow your merge the PR with squashing. We can discuss after we get there.

I have left a batch of comments.
Note that you will also need to change the title of the PR. you can refer to the past merged PRs to get an idea of how it should look like.

@aestriplex aestriplex changed the title get_entry_matching_value should handle ignore-case properly. fix(core-clp): get_entry_matching_value should handle ignore-case properly. Jan 25, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
components/core/src/clp_s/DictionaryReader.hpp (2)

69-70: Consider using std::vector<EntryType const*> on a single line.

The return type declaration spans two lines unnecessarily.

-    std::vector<EntryType const*>
-    get_entry_matching_value(std::string const& search_string, bool ignore_case) const;
+    std::vector<EntryType const*> get_entry_matching_value(std::string const& search_string, bool ignore_case) const;

159-182: Consider using emplace_back instead of push_back.

The implementation looks good, but consider using emplace_back for potentially better performance when adding entries to the vector.

-                entries.push_back(&entry);
+                entries.emplace_back(&entry);

Also at:

-                entries.push_back(&entry);
+                entries.emplace_back(&entry);
components/core/src/clp/EncodedVariableInterpreter.cpp (3)

389-390: Consider using auto with explicit type for better readability.

The type of entries could be made more explicit while still using auto.

-        auto const entries = var_dict.get_entry_matching_value(var_str, ignore_case);
+        auto const entries = std::vector<VariableDictionaryEntry const*>{var_dict.get_entry_matching_value(var_str, ignore_case)};

398-403: Consider using std::size instead of size().

The single entry case is handled correctly, but consider using std::size for consistency with modern C++.

-        if (entries.size() == 1) {
+        if (std::size(entries) == 1) {

405-413: Consider reserving space in the sets for better performance.

When dealing with multiple entries, consider reserving space in both sets to avoid reallocations.

         std::unordered_set<encoded_variable_t> encoded_vars;
+        encoded_vars.reserve(entries.size());
         std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(
                 entries.begin(),
                 entries.end()
         );
         for (auto const& entry : entries) {
             encoded_vars.insert(encode_var_dict_id(entry->get_id()));
         }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea41a9c and 61dea57.

📒 Files selected for processing (6)
  • components/core/src/clp/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/src/clp_s/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp_s/search/Output.cpp (1 hunks)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/tests/test-EncodedVariableInterpreter.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp
  • components/core/tests/test-EncodedVariableInterpreter.cpp
🧰 Additional context used
📓 Path-based instructions (4)
components/core/src/clp/DictionaryReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/DictionaryReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/search/Output.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (4)
components/core/src/clp_s/DictionaryReader.hpp (1)

67-67: LGTM! Documentation updated to reflect the new return type.

The docstring accurately describes that the function now returns a vector of entries.

components/core/src/clp/DictionaryReader.hpp (1)

236-259: Same suggestions as in the other DictionaryReader.hpp file apply here.

Consider:

  1. Using std::vector<EntryType const*> on a single line
  2. Using emplace_back instead of push_back
components/core/src/clp/EncodedVariableInterpreter.cpp (1)

391-397: LGTM! Early return for empty entries.

The empty check and early return is well-placed and maintains the function's contract.

components/core/src/clp_s/search/Output.cpp (1)

935-940: LGTM! Correctly handles multiple entries from the dictionary.

The code properly retrieves and processes multiple entries from the variable dictionary.

Comment on lines 405 to 412
std::unordered_set<encoded_variable_t> encoded_vars;
std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(
entries.begin(),
entries.end()
);
for (auto const& entry : entries) {
encoded_vars.insert(encode_var_dict_id(entry->get_id()));
}
sub_query.add_imprecise_dict_var(encoded_vars, entries_set);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see any value turning this into a new method for SubQuery that adds imprecise_dict_var by taking in a vector<VariableDictionaryEntry*>?

i.e.

void SubQuery::add_imprecise_dict_var(
        vector<VariableDictionaryEntry*> const& possible_var_dict_entries,
);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aestriplex In the email, I saw on of your comment mentioning about adding a add_imprecise_dict_var method that takes in a template, but can't find the comment in the Pull request.

Just want to double check if you intended to delete the comment.

@haiqi96 No, I did't intend to delete that comment, but it's still there actually (#690 (comment)).
I don't know if a more general approach (like that one, using iterators) could be helpful in this case, but it does not seem to me that it would add too much overhead anyway.

@haiqi96
Copy link
Contributor

haiqi96 commented Jan 27, 2025

@aestriplex In the email, I saw on of your comment mentioning about adding a add_imprecise_dict_var method that takes in a template, but can't find the comment in the Pull request.

Just want to double check if you intended to delete the comment.

@@ -64,9 +64,9 @@ class DictionaryReader {
* Gets the entry exactly matching the given search string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibber9809 Can you take a look at the changes in clp-s part?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
components/core/src/clp/EncodedVariableInterpreter.cpp (1)

404-411: Consider using range-based initialization for the entries set.

The current implementation creates two sets and populates them separately. Consider using range-based initialization for better readability and efficiency.

-        std::unordered_set<encoded_variable_t> encoded_vars;
-        std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(
-                entries.begin(),
-                entries.end()
-        );
-        for (auto const entry : entries) {
-            encoded_vars.insert(encode_var_dict_id(entry->get_id()));
-        }
+        std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(entries.begin(), entries.end());
+        std::unordered_set<encoded_variable_t> encoded_vars;
+        encoded_vars.reserve(entries.size());
+        std::transform(entries.begin(), entries.end(), 
+            std::inserter(encoded_vars, encoded_vars.begin()),
+            [this](auto const* entry) { return encode_var_dict_id(entry->get_id()); });
components/core/src/clp_s/search/Output.cpp (1)

940-941: Use const reference in the range-based for loop.

When iterating over entries, use const& to avoid unnecessary copying.

-                for (auto const& entry : entries) {
+                for (auto const* const& entry : entries) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61dea57 and aecae49.

📒 Files selected for processing (6)
  • components/core/src/clp/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/src/clp_s/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp_s/search/Output.cpp (1 hunks)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/tests/test-EncodedVariableInterpreter.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp
  • components/core/tests/test-EncodedVariableInterpreter.cpp
  • components/core/src/clp_s/DictionaryReader.hpp
  • components/core/src/clp/DictionaryReader.hpp
🧰 Additional context used
📓 Path-based instructions (2)
components/core/src/clp_s/search/Output.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (3)
components/core/src/clp/EncodedVariableInterpreter.cpp (2)

389-390: LGTM! Proper handling of empty entries.

The code correctly handles the case when no entries are found in the dictionary, returning false to indicate no match.

Also applies to: 396-396


397-402: LGTM! Efficient handling of single entry case.

The code optimizes the single entry case by avoiding the creation of sets and directly encoding the variable.

components/core/src/clp_s/search/Output.cpp (1)

935-940: LGTM! Proper handling of multiple entries.

The code correctly retrieves and processes multiple entries from the variable dictionary.

@haiqi96
Copy link
Contributor

haiqi96 commented Jan 28, 2025

Btw, please try to avoid using force push. It seems some comments are not displayed properly once there is a forced push

Comment on lines 404 to 412
std::unordered_set<encoded_variable_t> encoded_vars;
std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(
entries.begin(),
entries.end()
);
for (auto const entry : entries) {
encoded_vars.insert(encode_var_dict_id(entry->get_id()));
}
sub_query.add_imprecise_dict_var(encoded_vars, entries_set);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about adding a new signature for sub_query.add_imprecise_dict_var that takes in a vector of entires as argument?. So we can replace this piece of code witha a single call to the new signature @LinZhihao-723

Note this won't solve the code duplication problem, because CLP-S duplicates the entire Query class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning a vector, can we directly return an unordered set?

@aestriplex
Copy link
Author

Btw, please try to avoid using force push. It seems some comments are not displayed properly once there is a forced push

@haiqi96 You're right, my bad. I wasn't that used to GitHub pull requests, and I was trying to keep the branch up to date with man

@haiqi96
Copy link
Contributor

haiqi96 commented Jan 29, 2025

Btw, please try to avoid using force push. It seems some comments are not displayed properly once there is a forced push

@haiqi96 You're right, my bad. I wasn't that used to GitHub pull requests, and I was trying to keep the branch up to date with man

No problem. I also did a lot of forced push when I started working with PRs. To keep the branch up to date, you and always run git merge main. Note it would make your commit history looks ugly, but since eventually we will squash all changes into one commit when PRing into main, the commit history wouldn't matter.

There might be cases where when you do a rebase (I assume you are rebasing to main?), there are mutliple commits causing conflicts which could cause a lot of headache since you need to resolve conflict for each commits. Using git merge will make the things easier

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. Thanks for your contribution!
I gave a short review for the changes in clp. The overall fix looks good to me, the only issue is that we might need to look into this discussion and come with an agreement: https://github.com/y-scope/clp/pull/690/files#r1932918266
I also left some comments about some style changes to get u on board with with with with our general coding guidelines. Let me know if you have any questions on these comments.
I will proceed my review for clp-s and unit tests after we resolve the clp component.

@@ -82,9 +82,9 @@ class DictionaryReader {
* Gets the entry exactly matching the given search string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets the entry exactly matching the given search string
* Gets the entries matching the given search string

Let's use Haiqi's suggestion.

@@ -82,9 +82,9 @@ class DictionaryReader {
* Gets the entry exactly matching the given search string
* @param search_string
* @param ignore_case
* @return nullptr if an exact match is not found, the entry otherwise
* @return a (possibly empty) vector of entries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Haiqi's comment.

Comment on lines 404 to 412
std::unordered_set<encoded_variable_t> encoded_vars;
std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(
entries.begin(),
entries.end()
);
for (auto const entry : entries) {
encoded_vars.insert(encode_var_dict_id(entry->get_id()));
}
sub_query.add_imprecise_dict_var(encoded_vars, entries_set);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning a vector, can we directly return an unordered set?

Comment on lines 404 to 411
std::unordered_set<encoded_variable_t> encoded_vars;
std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(
entries.begin(),
entries.end()
);
for (auto const entry : entries) {
encoded_vars.insert(encode_var_dict_id(entry->get_id()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unordered_set<encoded_variable_t> encoded_vars;
std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(
entries.begin(),
entries.end()
);
for (auto const entry : entries) {
encoded_vars.insert(encode_var_dict_id(entry->get_id()));
}
std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set{
entries.cbegin(),
entries.cend()
};
std::unordered_set<encoded_variable_t> encoded_vars;
for (auto const* entry : entries) {
encoded_vars.emplace(encode_var_dict_id(entry->get_id()));
}
sub_query.add_imprecise_dict_var(encoded_vars, entries_set);

Some coding style related comments. If we choose to go with the design as discussed in another comment we might not need to apply this change in the end, but I still leave it here just to get u on board to our C++ coding guidelines for future development :D

  • We should order entries_set before encoded_vars to make the declaration of encoded_vars as close to its first use as possible
  • We use {} instead of () for explicitly typed variable initialization. It's a bit tricky to use {} in std containers sometimes (e.g., using {1, 2} to initialize a vector would not be equivalent to (1, 2)), but in this case the inputs are iterators so we should use {}.
  • Let's use the const version of iterators whenever possible for const correctness.
  • When using auto to declare a variable, we should add * for ptrs and & for references. This is actually checked by our clang-tidy rules, which is to improve the readability.
  • In most cases we should use emplace rather than insert (same as emplace_back rather than push_back) for in-place object construct. Since the underlying type is a fundamental type, so in this case using insert would be fine and there's no rule to not strictly enforce u to use emplace. If u don't want to think about emplace vs. insert everytime, u could always use emplace.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LinZhihao-723 for this feedback. I implemented your suggestions both in clp and in clp_s namespaces

@aestriplex aestriplex requested review from wraymo and a team as code owners February 21, 2025 18:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/core/src/clp_s/DictionaryReader.hpp (1)

177-184: Consider caching the uppercase value of each entry.

For case-insensitive searches, we're converting each entry's value to uppercase on every search. This could be inefficient for large dictionaries with frequent searches.

Consider caching the uppercase value in the EntryType class:

 class DictionaryEntry {
+    private:
+        mutable std::string m_value_uppercase;
+
+    public:
+        std::string const& get_value_uppercase() const {
+            if (m_value_uppercase.empty()) {
+                m_value_uppercase = boost::algorithm::to_upper_copy(m_value);
+            }
+            return m_value_uppercase;
+        }
 };

Then update the search logic:

-        if (boost::algorithm::to_upper_copy(entry.get_value()) == search_string_uppercase) {
+        if (entry.get_value_uppercase() == search_string_uppercase) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8824757 and 1f7f35a.

📒 Files selected for processing (4)
  • components/core/src/clp/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp/EncodedVariableInterpreter.cpp (2 hunks)
  • components/core/src/clp_s/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp
  • components/core/src/clp/EncodedVariableInterpreter.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp_s/DictionaryReader.hpp
  • components/core/src/clp/DictionaryReader.hpp
🔇 Additional comments (7)
components/core/src/clp_s/DictionaryReader.hpp (3)

63-67: LGTM!

The docstring has been updated to accurately reflect the new return type and behaviour.


69-70: LGTM!

The method signature has been updated to return a vector of pointers, which aligns with the requirement to handle multiple matching entries.


164-175: LGTM!

The case-sensitive branch has been optimized to use std::ranges::find_if and return early, improving readability and performance.

components/core/src/clp/DictionaryReader.hpp (4)

82-85: LGTM!

The docstring has been updated to accurately reflect the new return type and behaviour.


87-88: LGTM!

The method signature has been updated to return a vector of pointers, which aligns with the requirement to handle multiple matching entries.


241-252: LGTM!

The case-sensitive branch has been optimized to use std::ranges::find_if and return early, improving readability and performance.


254-261: Consider caching the uppercase value of each entry.

For case-insensitive searches, we're converting each entry's value to uppercase on every search. This could be inefficient for large dictionaries with frequent searches.

@aestriplex
Copy link
Author

@LinZhihao-723

Instead of returning a vector, can we directly return an unordered set?

It was considered in one of the comments here. Looks good to me

@LinZhihao-723
Copy link
Member

@LinZhihao-723

Instead of returning a vector, can we directly return an unordered set?

It was considered in one of the comments here. Looks good to me

Sure, make sense to me. Shall we create a new issue to keep track of this vector-to-set inefficiency?

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clp-s changes lgtm.
@gibber9809 Do you want to take a look for a sanity check?
NTS: I will still need to go through the unit test changes

@aestriplex
Copy link
Author

@LinZhihao-723

Sure, make sense to me. Shall we create a new issue to keep track of this vector-to-set inefficiency?

Yes, I'll open a new issue to keep track of this, and I'll link this pr

@gibber9809
Copy link
Contributor

The clp-s changes lgtm. @gibber9809 Do you want to take a look for a sanity check? NTS: I will still need to go through the unit test changes

yes, clp-s changes look good to me as well

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the unit tests: all of them are just style changes

@aestriplex
Copy link
Author

@LinZhihao-723 I applied your suggestions. I also added a couple of missing std:: on my tests

@LinZhihao-723
Copy link
Member

@LinZhihao-723

Instead of returning a vector, can we directly return an unordered set?

It was considered in one of the comments here. Looks good to me

lgtm! Let's create the issue and I will approve :D

@aestriplex
Copy link
Author

@LinZhihao-723 I created the issue. Should I merge main into this branch and squash all the commits before you approve?

@LinZhihao-723
Copy link
Member

@LinZhihao-723 I created the issue. Should I merge main into this branch and squash all the commits before you approve?

  • Yes, please merge main to ensure everything gets built.
  • You don't need to worry about squash; we will do a squash merge for every PR
  • I will comment on the PR title along with my approval; you might need to update the PR title accordingly since it will be used as the commit message to main.

@aestriplex
Copy link
Author

@LinZhihao-723 Ok, I merged main, thank you

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

fix(core): Update `DictionaryReader::get_entry_matching_value` to handle case-insensitive searches (fixes #648).

@aestriplex aestriplex changed the title fix(core-clp): get_entry_matching_value should handle ignore-case properly. fix(core): Update DictionaryReader::get_entry_matching_value to handle case-insensitive searches (fixes #648). Feb 27, 2025
@aestriplex
Copy link
Author

@LinZhihao-723 LGTM, updated. Thanks

Copy link
Contributor

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@LinZhihao-723 LinZhihao-723 merged commit 3dc6927 into y-scope:main Feb 27, 2025
21 checks passed
@aestriplex aestriplex deleted the #648 branch February 28, 2025 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants