community: Added ADOBE PDF EXTRACT #23686

DavidMoserAI · 2024-06-30T13:00:47Z

Description: Adobe PDF Extract is a service that provides superior performance over other document intelligence services, both in its accuracy and variety of features. Parsing documents based on their layout information is crucial for retrieval augmented generation, especially when trying to achieve production grade performance. I have used this service myself successfully and would like to contribute my code to the world.
Issue: Someone raised an issue about this a while ago: #8163
Dependencies: A user would have to install the adobe pdf services library like so: pip install pdfservices-sdk

Twitter handle: @DavidMoserAI

vercel · 2024-06-30T13:00:51Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	🛑 Canceled (Inspect)			Dec 14, 2024 2:02am

DavidMoserAI · 2024-07-14T19:18:46Z

@hwchase17

DavidMoserAI · 2024-08-01T06:22:44Z

@baskaryan @efriis @eyurtsev

vercel · 2024-09-02T20:15:19Z

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

baskaryan · 2024-09-02T20:15:34Z

libs/community/langchain_community/document_loaders/parsers/adobe_pdf_extract.py

+class AdobePDFExtractParser(BaseBlobParser):
+    """Loads a document using the Adobe PDF Services API.
+
+    Args:


could we expand the arg descriptions here

davemaguire · 2024-10-17T22:32:46Z

Hello! I would love to see the adobe pdf api added to langchain. What needs to be done to get this into main? Just address the following comment?

could we expand the arg descriptions here

I am happy to help get this over the line. We use adobe pdf extract extensively and would love to have this integrated in langchain.
@baskaryan @DavidMoserAI

DavidMoserAI · 2024-10-19T05:50:59Z

@davemaguire I have updated the arg descriptions and am waiting for a response from the moderators.

davemaguire · 2024-10-19T06:39:29Z

Awesome, great work! I'm eager to see this feature in langchain
@efriis @eyurtsev @hwchase17

efriis · 2024-12-14T01:49:17Z

@DavidMoserAI this PR had a lot of issues to fix (wrong sdk used in docs, tests for an old Loader instead of Parser implementation), so I'm hesitant to merge it without someone testing it. if you could take at the docs and also screenshot using it to parse an actual PDF using the service, that would be great! Otherwise will probably close without an actual test.

If you're interested in maintaining this integration without us in the loop and publishing a higher-quality integration, we'd love to get an integration package out! Future PRs against langchain would just be {docs updates, as well as registering your package in libs/packages.yml, deprecating this community integration in favor of your integration package}

Here's the guide, and if you have questions, feel free to leave them in the comments on those pages so others can see them! https://python.langchain.com/docs/contributing/how_to/integrations/

ci failing still

efriis · 2024-12-16T19:54:39Z

closing for now, and if you decide to pick it up again would recommend publishing externally!

david-1m added 9 commits June 30, 2024 12:30

Added AdobePDFExtractionLoader to document_loaders

809404a

Added AdobePDFExtraction parser to document_loaders.parsers

f35d563

Added AdobePDFExtractionLoader to __init__.py

7834d0f

Added AdobePDFExtractionParser to __init__.py

eb5f696

Changed formatting

45b5f89

Added unit test for AdobePDFExtractionParser

984a13a

Changed name

eb16258

Added documentation for Adobe PDF Extract

e62f6d5

Changed formatting

5d07ddd

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Jun 30, 2024

vercel bot had a problem deploying to Preview June 30, 2024 13:12 Failure

Merge branch 'master' into master

b9cc703

vercel bot had a problem deploying to Preview July 14, 2024 19:19 Failure

baskaryan added 3 commits September 2, 2024 13:13

fmt

9d8a3dc

fmt

9709cee

Merge branch 'master' into DavidMoserAI/master

4e834d6

baskaryan reviewed Sep 2, 2024

View reviewed changes

baskaryan added 3 commits September 2, 2024 13:27

fmt

8a554a8

fmt

2aabdd2

fmt

0b7a9af

vercel bot temporarily deployed to Preview September 2, 2024 20:48 Inactive

Elaborate arguments

242f3be

vercel bot deployed to Preview September 28, 2024 12:55 View deployment

Merge branch 'master' into master

d2b04f1

vercel bot deployed to Preview November 11, 2024 15:20 View deployment

efriis added 3 commits December 13, 2024 17:23

Merge branch 'master' into DavidMoserAI/master

826c313

x

c6fbe24

x

eb41e9a

efriis previously approved these changes Dec 14, 2024

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Dec 14, 2024

efriis self-assigned this Dec 14, 2024

vercel bot temporarily deployed to Preview December 14, 2024 02:02 Inactive

efriis closed this Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

community: Added ADOBE PDF EXTRACT #23686

community: Added ADOBE PDF EXTRACT #23686

DavidMoserAI commented Jun 30, 2024 •

edited

Loading

vercel bot commented Jun 30, 2024 •

edited

Loading

DavidMoserAI commented Jul 14, 2024

DavidMoserAI commented Aug 1, 2024

vercel bot commented Sep 2, 2024

baskaryan Sep 2, 2024

davemaguire commented Oct 17, 2024

DavidMoserAI commented Oct 19, 2024

davemaguire commented Oct 19, 2024

efriis commented Dec 14, 2024

efriis commented Dec 16, 2024

community: Added ADOBE PDF EXTRACT #23686

community: Added ADOBE PDF EXTRACT #23686

Conversation

DavidMoserAI commented Jun 30, 2024 • edited Loading

vercel bot commented Jun 30, 2024 • edited Loading

DavidMoserAI commented Jul 14, 2024

DavidMoserAI commented Aug 1, 2024

vercel bot commented Sep 2, 2024

baskaryan Sep 2, 2024

Choose a reason for hiding this comment

davemaguire commented Oct 17, 2024

DavidMoserAI commented Oct 19, 2024

davemaguire commented Oct 19, 2024

efriis commented Dec 14, 2024

efriis commented Dec 16, 2024

DavidMoserAI commented Jun 30, 2024 •

edited

Loading

vercel bot commented Jun 30, 2024 •

edited

Loading