Use a specific TextFixerConfig instance to control ftfy's text fixing process. #298
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Link to issue number:
Fixed #297
Summary of the issue:
ftfy
was incorrectly modifying Chinese punctuation and other text due to aggressive normalization and unwanted fixes, impacting readability.Description of how this pull request fixes the issue:
This PR adjusts
ftfy
's configuration to prevent incorrect text modifications:fix_character_width
,uncurl_quotes
,fix_latin_ligatures
, andunescape_html
are now disabled.plain_text
,pdf
,fitz
, andStructuredHtmlParser
for consistent text processing.This prevents
ftfy
from unintentionally altering punctuation, ligatures, quotes, and HTML entities.Testing performed:
plain_text
,pdf
, andStructuredHtmlParser
output.Known issues with pull request:
No known issues. Further testing and review are welcome.