You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
Commit to Help
I commit to help with one of those options 👆
Example Code
# not relevant
Description
Docling extracts PDF / DOCX / HTML and other document types to a rich representation (incl. layout, tables etc.), which it can export to Markdown or JSON.
As outlined in its technical report, Docling is based on two models developed by IBM Research: a DocLayNet-based layout analysis model and the TableFormer table recognition model.
We would like to contribute a Docling integration to LangChain, namely considering:
a Docling Loader, which can load documents as Markdown or JSON into LangChain Documents (corresponding to whole documents, e.g. articles), and
a Docling Splitter, which can parse the above-mentioned JSON format to LangChain Documents corresponding to the individual document elements identified by Docling (paragraphs, tables, lists etc).
By using these integration, LangChain users will be able to leverage Docling's conversion quality as well as as the rich metadata it can extract, such as page, bounding box etc.
When it comes to adding such integrations we have seen there are two options: either within the community package or as a new partner package.
👉 Which of the two options would you recommend for Docling?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Checked other resources
Commit to Help
Example Code
# not relevant
Description
Docling extracts PDF / DOCX / HTML and other document types to a rich representation (incl. layout, tables etc.), which it can export to Markdown or JSON.
As outlined in its technical report, Docling is based on two models developed by IBM Research: a DocLayNet-based layout analysis model and the TableFormer table recognition model.
We would like to contribute a Docling integration to LangChain, namely considering:
Document
s (corresponding to whole documents, e.g. articles), andDocument
s corresponding to the individual document elements identified by Docling (paragraphs, tables, lists etc).By using these integration, LangChain users will be able to leverage Docling's conversion quality as well as as the rich metadata it can extract, such as page, bounding box etc.
When it comes to adding such integrations we have seen there are two options: either within the community package or as a new partner package.
👉 Which of the two options would you recommend for Docling?
System Info
Beta Was this translation helpful? Give feedback.
All reactions