Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: fix ChatPromptTemplate doesn't accept PDF data as bytes #28011

Closed
wants to merge 5 commits into from

Conversation

rkwan05
Copy link

@rkwan05 rkwan05 commented Nov 9, 2024

Description:

  • Functionality (chat.py): added if statement in ChatPromptTemplate's from_template() that catches data in mime_type: {pdf_data} format
    • extracts text from pdf and inserts it into the prompt
  • Unit Test (test_chat.py): added test that passes in pdf data to ChatPromptTemplate

Issue: #27346
Dependencies:

  • base64: Decodes pdf data in chat.py > extract_pdf_text() and encodes pdf data in text_chat.py > test_create_pdf_chat_prompt()
  • io: Creates an in-memory binary stream to handle pdf data in chat.py > extract_pdf_text()
  • pypdf: To import PDFReader to read the pdf file in chat.py > extract_pdf_text()

Lint and test:

Ran make format, make lint, and make test from the root of libs\core. All checks passed successfully.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Nov 9, 2024
Copy link

vercel bot commented Nov 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ❌ Failed (Inspect) Nov 26, 2024 2:33am

(
"human",
[
{"type": "media", "mime_type": "application/pdf", "data": pdf_data},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which chat model supports this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Efriis, I just wanted to make sure that we understood the question correctly! If you’re asking which chat models this implementation would be compatible with, it should work with all chat models because it is sending the pdf data as text. But please let me know if you were looking for something else!

@efriis
Copy link
Member

efriis commented Dec 3, 2024

closing because this isn't desirable in the prompt template. in general, the right process here would be parse the pdfs using something like this, and plumb that text into your prompt

this implementation would import PyPDF parsing for anyone trying to use a prompt template, which isn't mergable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: core Related to langchain-core size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants