Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make TermNormIsSubStringMappingStrategy handle multi-word substrings #24

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EFord36
Copy link
Collaborator

@EFord36 EFord36 commented Jun 4, 2024

Closes #274

Currently, this will only prefer a term if the term norm is a single word contained (as a word) in the ent_match_norm. Change to also prefer it if the term norm is a sequence of words that are a substring of the ent_match_norm.

In draft because I still need to:

  • Check impact on performance (we're using a regex match now, is this too slow? We could compare to looking for the first word with list.index, and then iterating. The implementation here is longer though. We could also try mypyc on that for interest).
  • Check impact on behaviour (does this change anything in the test documents we have for the different use cases? It should help, but does it?)
  • Write tests

That said, the actual code is ready to look at to assess 'is this a good idea' in a broad sense?

@EFord36 EFord36 requested a review from RichJackson June 4, 2024 15:00
Currently, this will only prefer a term if the term norm is a single
word contained (as a word) in the ent_match_norm. Change to also prefer
it if the term norm is a sequence of words that are a substring of the
ent_match_norm.
@EFord36 EFord36 force-pushed the multi-word-substring-checking branch from 119f242 to 0d706b3 Compare June 4, 2024 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant