The tokenize() function shouldn't split sentences on abbreviations like Dr. Fahad, Mr. Wayne etc #1
Labels
enhancement
New feature or request
good first issue
Good for newcomers
hacktoberfest
help wanted
Extra attention is needed
Right now the tokenize() function is splitting whenever a ' . ' character is found. Most of the time it's a correct approach to split a fine into sentences but sometimes the abbreviation like Dr., Mr., Mrs, etc. appear in a middle of a sentence and hence splits the sentence right there. I want to enhance the regex to not to spit the sentences on abbreviations.
The text was updated successfully, but these errors were encountered: