Content of each directory is described here:
all txt files contains 1-5mb.
- Quranic Data
- Saheefeh Sajjadiah
- Kitab Tawheed (osool)
There are 50,000 posts in this data set from an Iranian social network(Cafejomle). Posts are divided into three files so that the size of the files is not large. This dataset contains three properties: user_id, date, and post text.
In this category, we have multiple files that contain data from news websites. Each file has a prefix that identifies the website that data come from.
Files that start with the "fars-news" prefix contain data that have been gathered from farsnews website.
Here you can see each item's structure in JSON files
Attribute | Description | Type |
---|---|---|
title |
news title | string |
abstract |
abstract ( or description ) of each news | string |
paragraphs |
list of divided text items on the website | list(string) |
cat |
news category ( if exists ) | string |
subcat |
news sub category ( if exists ) | string |
tags |
list of associated tags on the website for each news | list(string) |
link |
news url | string |
Files that start with the "donya-e-eqtesad" prefix contain data that have been gathered from donya-e-eqtesad newspaper.
Here you can see each item's structure in JSON files
Attribute | Description | Type |
---|---|---|
newspaper_code |
newspaper code | string |
category |
news category | string |
title |
news title | string |
tags |
list of associated tags on the website for each news | list(string) |
abstract |
abstract ( or description ) of each news | string |
paragraphs |
list of divided text items on the website | list(string) |
In this cetegory, we have multiple files that contain data from sports websites. Each file has a prefix that identifies the website that data come from.
Files that start with the "varzesh3" prefix contain data that have been gathered from varzesh3 website.
Here you can see each item's structure in JSON files
Attribute | Description | Type |
---|---|---|
code |
news code | int |
title |
news title | string |
abstract |
abstract ( or description ) of each news | string |
paragraphs |
list of divided text items on the website | list(string) |
tags |
list of associated tags on the website for each news | list(string) |
link |
news url | string |
- Hafez/Saadi poesm
- All poets
- The Little Prince