Best option for adding Docling document reader integration: community or partner package? #27641

vagenas · 2024-10-25T12:20:22Z

vagenas
Oct 25, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

# not relevant

Description

Docling extracts PDF / DOCX / HTML and other document types to a rich representation (incl. layout, tables etc.), which it can export to Markdown or JSON.
As outlined in its technical report, Docling is based on two models developed by IBM Research: a DocLayNet-based layout analysis model and the TableFormer table recognition model.

We would like to contribute a Docling integration to LangChain, namely considering:

a Docling Loader, which can load documents as Markdown or JSON into LangChain Documents (corresponding to whole documents, e.g. articles), and
a Docling Splitter, which can parse the above-mentioned JSON format to LangChain Documents corresponding to the individual document elements identified by Docling (paragraphs, tables, lists etc).

By using these integration, LangChain users will be able to leverage Docling's conversion quality as well as as the rich metadata it can extract, such as page, bounding box etc.

When it comes to adding such integrations we have seen there are two options: either within the community package or as a new partner package.

👉 Which of the two options would you recommend for Docling?

System Info

# not relevant

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best option for adding Docling document reader integration: community or partner package? #27641

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Best option for adding Docling document reader integration: community or partner package? #27641

vagenas Oct 25, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 0 comments

vagenas
Oct 25, 2024