Question regarding Markdown structure in Jina Reader API #152

medmabcf · 2024-11-06T14:05:41Z

Hi,

I’m trying to understand the specific markdown structure used by the Jina Reader API when converting HTML to markdown. For instance, I’ve observed the following mappings:

<h1> tags are mapped to ==========
<h2> tags are mapped to ------

Is this the standard markdown structure followed by the Jina Reader API? Additionally, I’ve noticed that the output can sometimes vary. Is this due to the use of a heuristic method or some other factor?

Thanks!

The text was updated successfully, but these errors were encountered:

nomagick · 2024-11-12T06:39:56Z

We are using turndown for HTML to Markdown transformation. Whether h1/h2 gets transformed into ## or ==/-- can be configured with turndown, but we have not customized this option and followed the default.

The default output sometimes changes because Reader automatically switches the use of readability for some level of smart trimming.
If readability would apparently not work for the page we fall back to a rule-based approach known as markdown.

If you find the markdown format preferable, you can specify x-respond-with: markdown or x-return-format: markdown to stabilize the return format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding Markdown structure in Jina Reader API #152

Question regarding Markdown structure in Jina Reader API #152

medmabcf commented Nov 6, 2024

nomagick commented Nov 12, 2024

Question regarding Markdown structure in Jina Reader API #152

Question regarding Markdown structure in Jina Reader API #152

Comments

medmabcf commented Nov 6, 2024

nomagick commented Nov 12, 2024