-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Improve KeywordList parsing to support escaped characters and nested structures #12888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve KeywordList parsing to support escaped characters and nested structures #12888
Conversation
can someone help me on this. is this a valid comment from Bot ? I have never used DisplayName annotation and I changed method name to be more comprehensive. Am I missing anything here ? |
Seems like a false positive from the bot @koppor |
@krishnagjsForGit please change the PR title to contain text summarizing the fix. |
Done. is this fine @koppor ? |
@trag-bot didn't find any issues in the code! ✅✨ |
String chain = tok.nextToken(); | ||
Keyword chainRoot = Keyword.of(chain.split(hierarchicalDelimiter.toString())); | ||
keywordList.add(chainRoot); | ||
List<String> tokens = splitRespectingEscapes(keywordString, delimiter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yubo-Cao What do you think about this? Maybe an ANTLR grammar would help here, too?
I'm sorry for not getting back to you sooner. I have two suggestions:
|
Closes #12810
What I changed
Implemented escape handling in KeywordList.parse(String, Character, Character) to support escaping the keyword separator using backslash (). This prevents incorrect splitting of keywords when delimiters appear within the keyword itself.
Refactored the parsing logic for clarity, including renaming loop variables for better readability and intent.
Added test cases in KeywordListTest to cover escape scenarios such as:
Escaped delimiter: "one\,two" → ["one,two"]
Escaped backslash: "one\\two" → ["one\two"]
Mixed escaped and unescaped delimiters.
Where the changes are
KeywordList.java: Modified the parse method to include escape handling logic using a character-by-character loop.
KeywordListTest.java: Added JUnit test cases to ensure parsing behaves correctly with escaped delimiters and backslashes.
Why I made these changes
Fixes issue #12810: Current parsing breaks when delimiters appear inside keywords (e.g., in MeSH terms). There was no way to escape the delimiter character, leading to incorrect keyword splitting.
Improves consistency: Provides a more robust, predictable behavior for keyword parsing, especially when users or importers use delimiters in keyword values.
Next Steps
Awaiting review and feedback.
If the community prefers a different escape character or behavior, I’m happy to adjust the implementation.
After this is merged, similar logic could be extracted or reused where keyword parsing happens elsewhere (e.g., importers).
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if change is visible to the user)