Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add pdfminer parameters configuration #3918

Merged
merged 12 commits into from
Feb 17, 2025
Merged

Conversation

plutasnyy
Copy link
Contributor

@plutasnyy plutasnyy commented Feb 11, 2025

This pull request adds the ability to configure multiple pdfminer parameters (with the simple possibility to extend for the additional parameters). One of the parameters overwrites the default from LA Params config class.

Example:

partition(
    filename=example_doc_path("pdf/layout-parser-paper-fast.pdf"),
    pdfminer_line_margin=1.123,
    pdfminer_char_margin=None,
    pdfminer_line_overlap=0.0123,
    pdfminer_word_margin=3.21,
)
assert pdfminer_mock.call_args.kwargs == {
    "line_margin": 1.123,
    "line_overlap": 0.0123,
    "word_margin": 3.21,
}

@plutasnyy plutasnyy self-assigned this Feb 11, 2025
@plutasnyy plutasnyy changed the title Pdf miner params check Feat: Add pdfminer parameters configuration Feb 12, 2025
@plutasnyy plutasnyy requested a review from MaksOpp February 12, 2025 16:41
@plutasnyy plutasnyy marked this pull request as ready for review February 12, 2025 16:41
…pdate (#3920)

This pull request includes updated ingest test fixtures.
Please review and merge if appropriate.

Co-authored-by: plutasnyy <[email protected]>
@plutasnyy plutasnyy added this pull request to the merge queue Feb 17, 2025
Merged via the queue into main with commit 3973a30 Feb 17, 2025
41 checks passed
@plutasnyy plutasnyy deleted the pdf-miner-params-check branch February 17, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants