-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community: Added ADOBE PDF EXTRACT #23686
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Deployment failed with the following error:
|
class AdobePDFExtractParser(BaseBlobParser): | ||
"""Loads a document using the Adobe PDF Services API. | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we expand the arg descriptions here
Hello! I would love to see the adobe pdf api added to langchain. What needs to be done to get this into main? Just address the following comment?
I am happy to help get this over the line. We use adobe pdf extract extensively and would love to have this integrated in langchain. |
@davemaguire I have updated the arg descriptions and am waiting for a response from the moderators. |
Awesome, great work! I'm eager to see this feature in langchain |
@DavidMoserAI this PR had a lot of issues to fix (wrong sdk used in docs, tests for an old Loader instead of Parser implementation), so I'm hesitant to merge it without someone testing it. if you could take at the docs and also screenshot using it to parse an actual PDF using the service, that would be great! Otherwise will probably close without an actual test. If you're interested in maintaining this integration without us in the loop and publishing a higher-quality integration, we'd love to get an integration package out! Future PRs against langchain would just be {docs updates, as well as registering your package in Here's the guide, and if you have questions, feel free to leave them in the comments on those pages so others can see them! https://python.langchain.com/docs/contributing/how_to/integrations/ |
closing for now, and if you decide to pick it up again would recommend publishing externally! |
Description: Adobe PDF Extract is a service that provides superior performance over other document intelligence services, both in its accuracy and variety of features. Parsing documents based on their layout information is crucial for retrieval augmented generation, especially when trying to achieve production grade performance. I have used this service myself successfully and would like to contribute my code to the world.
Issue: Someone raised an issue about this a while ago: #8163
Dependencies: A user would have to install the adobe pdf services library like so: pip install pdfservices-sdk
Twitter handle: @DavidMoserAI