A library for parsing OpenITI special mARkdown syntax into friendly JSON format.
- Parses OpenITI mARkdown headers, paragraphs, verses, biographies, historical events, and more into JSON.
- Extracts metadata and structural elements preserving their context and hierarchy.
- Supports parsing of complex morphological patterns and riwāyāt units.
- Handles pagination and block quotes within the text.
using npm:
npm install @openiti/markdown-parser
using yarn:
yarn add @openiti/markdown-parser
To use mARkdown-parser
, import the parseMarkdown
function from the package and pass your OpenITI mARkdown text to it. The function will return a JSON object containing the parsed content.
import { parseMarkdown } from '@openiti/markdown-parser';
const mARkdown = `
// ...
`;
const parsed = parseMarkdown(mARkdown);
console.log(parsed);
The following is an example output of the parser, showing how it structures different elements of the OpenITI mARkdown:
[
{
"type": "title",
"content": "رسالة في التوبة"
},
{
"type": "pageNumber",
"content": {
"volume": "01",
"page": "218"
}
},
{
"type": "paragraph",
"content": "فصل"
},
{
"type": "paragraph",
"content": "قال الإمام العلامة شيخ الإسلام تقي الدين أبو العباس أحمد بن عبدالحليم ابن تيمية رحمه الله"
}
...
]
Parses a string of OpenITI mARkdown into a structured JSON format.
markdownText
(string) - The OpenITI mARkdown text to be parsed.
ParseResult
(Object) - A JSON object representing the parsed content. TheParseResult
object includesmetadata
andcontent
properties.
Represents the smallest unit of content, such as a title, header, paragraph, blockquote, etc.
An object containing metadata
and content
. metadata
is an object of key-value pairs extracted from the mARkdown, while content
is an array of Block
objects representing the structured content of the document.
The library defines several blocks to structure the parsed content. Here's a detailed look at the Block
types:
Type | Description |
---|---|
title |
Represents a title within the text. |
header-1 |
Denotes a level 1 header, the highest level, typically used for major sections. |
header-2 |
Denotes a level 2 header, used for subsections under a header-1 . |
header-3 |
Denotes a level 3 header, used for sub-subsections under a header-2 . |
header-4 |
Denotes a level 4 header, indicating further subdivision under a header-3 . |
header-5 |
The lowest level header, indicating the most granular sectioning under a header-4 . |
paragraph |
Represents a paragraph of text. |
blockquote |
Indicates a block of text that is quoted from another source. |
category |
A categorization label, used for organizing content into categories. |
verse |
Represents a verse, typically in poetry or Quranic verses. Each array item is a hemistich. |
pageNumber |
Denotes the page number. The content includes an object with volume and page strings. |
year_of_birth |
Indicates the year of birth of a person, in Hijri. |
year_of_death |
Indicates the year of death of a person, in Hijri. |
year |
General purpose year, used in various contexts, in Hijri. |
age |
Represents the age of a person, in Hijri years. |
Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.
This project is licensed under the MIT License.
This library is built to support the work done by the OpenITI team and the larger community working on Arabic and Islamicate texts. For more information on OpenITI mARkdown conventions, visit Maxim Romanov's mARkdown guide.