Code and dataset of the tasks are released here. In order to use the dataset interested ones have to follow policy of workshop organizers.
Author: Nafisa Tabassum∗, Sumaiya Rahman Aodhora∗, Rowshon Akter, Jawad Hossain, Shawly Ahsan and Mohammed Moshiul Hoque
Venue: Shared task description paper of DravidianLangTech workshop collocated with EACL-2024. DravidianLangTech@EACL2024
Paper Link: https://aclanthology.org/2024.dravidianlangtech-1.30/
The alarming rise of fake news on social media poses a significant threat to public discourse and decision-making. While automatic detection of fake news offers a promising solution, research in low-resource languages like Malayalam often falls behind due to limited data and tools. This paper presents the participation of team Punny_Punctuators in the Fake News Detection in Dravidian Languages shared task at DravidianLangTech@EACL 2024, addressing this gap. The shared task focuses on two sub-tasks: 1. classifying social media texts as original or fake, and 2. categorizing fake news into 5 categories. We experimented with various machine learning (ML), deep learning (DL) and transformer-based models as well as processing techniques such as transliteration. Malayalam-BERT achieved the best performance on both sub-tasks, which obtained us
- Developed several machine learning (ML), deep learning (DL), and transformer-based models to identify fake news in the Malayalam language.
- Investigated and assessed the performance of the models using a variety of metrics to determine the best approach for the classification of fake news..
The number of instances used to train, validate and test the models summarized in Table 1.
Figure 1 presents the schematic diagram of our system, which has three major phases: preprocessing, feature extraction and classification.
Table 3 presents the evaluation results of the tasks on the test set.
Figure 2 presents the confusion matrices of the best model for each language.
This work aimed to detect and classify fake news from Malayalam social media text. We have thoroughly investigated several machine learning (ML), deep learning (DL) and transformer-based models for Malayalam fake news identification and classification. The Malayalam-BERT model has proven to be more effective than the others, as evidenced by its highest macro F1-Score of 0.87 for subtask-1 and 0.17 for subtask-2.
In subtask-1, the model excels, securing the 2\textsuperscript{nd} position with a noteworthy macro
All the works are supported and funded by CUET NLP Lab. Besides, we are thanking to Prof. Dr. Mohammed Moshiul Hoque for his valuable guidance.
If you use our work please consider citing our paper:
@inproceedings{tabassum2024punny_punctuators,
title={Punny\_punctuators@ dravidianlangtech-eacl2024: Transformer-based approach for detection and classification of fake news in malayalam social media text},
author={Tabassum, Nafisa and Aodhora, Sumaiya and Akter, Rowshon and Hossain, Jawad and Ahsan, Shawly and Hoque, Mohammed Moshiul},
booktitle={Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages},
pages={180--186},
year={2024}
}