-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathindex.html
56 lines (41 loc) · 3.65 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
layout: default
---
<div class="container">
<div class="starter-template row">
<h1><a href="{{ site.baseurl }}" style="color:black;">Kurdish Folkloric Lyrics Corpus</a></h1>
<p class="lead">A collection of oral narratives and songs of generations...</p>
<p align="justify">
Kurdish poetry and prose narratives were historically transmitted orally and less in a written form. Being an essential medium of oral narration and literature, Kurdish lyrics have had a unique attribute in becoming a vital resource for different types of studies, including <b>Digital Humanities</b>, <b>Computational Folkloristics</b> and <b>Computational Linguistics</b>. As an initial study of its kind for the Kurdish language, this paper presents our efforts in transcribing and collecting Kurdish folk lyrics as a corpus that covers various Kurdish musical genres, in particular <i>Beyt</i>, <i>Goranî</i>, <i>Bend</i>, and <i>Heyran</i>. We believe that this corpus contributes to Kurdish language processing in several ways, such as compensation for the lack of a long history of written text by incorporating oral literature, presenting an unexplored realm in Kurdish language processing, and assisting the initiation of Kurdish computational folkloristics. Our corpus contains 49,582 tokens in the Sorani dialect of Kurdish.
</p>
<p align="justify">
This corpus is the result of the manual transcription of audiovisual material. You can read more about this project in <a href="https://sinaahmadi.github.io/docs/articles/ahmadi2020folklyrics.pdf" target="_blank">our paper</a>. It is being constantly completed. Everybody can contribute to enrich the corpus by:
<ul>
<li>Adding more <b>folkloric</b> lyrics in other dialects, namely Kurmanji, Southern Kurdish and Gorani languages</li>
<li>Translating the lyrics into other languages, particularly English</li>
<li>Reporting errors or mistranscriptions</li>
</ul>
</p>
<p align="justify">
If you are using any part of the data, cite <a href="https://sinaahmadi.github.io/docs/articles/ahmadi2020folklyrics.pdf" target="_blank">our paper</a> using <a href="https://sinaahmadi.github.io/bibliography/ahmadi2020folklyrics.txt" target="_blank">this bib file</a> or the following:
<blockquote class="blockquote">
Ahmadi, Sina, Hossein Hassani, Kamaladdin Abedi. "A Corpus of the Sorani Kurdish Folkloric Lyrics". In Proceedings of the 1st Joint Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) Workshop at the 12th International Conference on Language Resources and Evaluation (LREC), 2020.
</blockquote>
The corpus is available in XML (TEI) and JSON at <a href="https://github.com/KurdishBLARK/KurdishLyricsCorpus" target="_blank">https://github.com/KurdishBLARK/KurdishLyricsCorpus</a>.
</p>
</div>
<hr>
<h3>Search in the items</h3>
{% include interactive-table.html %}
<hr>
<div class="row">
<p><small><b>
This corpus is publicly available for non-commercial use under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank">CC BY-NC-SA 4.0 license</a></b>. © Copyright 2020 <a href="https://kurdishblark.github.io/" target="_blank">Kurdish-BLARK</a></small></p>
<p>
<small>
This interface was created by <a href="https://sinaahmadi.github.io/" target="_blank">Sina Ahmadi</a> based on <a href="https://github.com/rypan/jekyll-db" target="_blank">Jekyll-DB</a> (<a href="http://jekyllrb.com/" target="_blank">Jekyll</a> + <a href="http://listjs.com/">ListJS</a> + <a href="http://getbootstrap.com/" target="_blank">Bootstrap</a>).
</small>
</p>
</div>
</div><!-- /.container -->
</div>