-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add converter based on pdfminer #7607
Conversation
Pull Request Test Coverage Report for Build 8901937116Details
💛 - Coveralls |
Nice! PR looks great, there's some linting issues that need to be fixed though. I see Also when adding new lazy imports remember to update the |
Also I suggest rebasing or merging |
Didn't notice at all we were missing tests, this causes coverage to go down a bit. Could you add some? 👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Related Issues
Proposed Changes:
The default PDF converter may not extract text correctly for PDFs with complex layouts, such as those containing multiple text columns. To address this issue,
PDFMinerToDocument
is being introduced to enable users to customize text extraction from PDF files through pdfminer native arguments. Users can then configure the object to retain the reading order, among other options.How did you test it?
Tested using several unit tests
Notes for the reviewer
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.