Towards Efficient and Explainable Hate Speech Detection via Model Distillation

Overview 🌟

This project presents a Llama-3-8B-Distil-MetaHate a Llama 3 8B distilled model designed for hate speech explanation and classification. By utilizing Chain-of-Thought methodologies, we aim to enhance the interpretability and operational efficiency of hate speech detection systems.

This work was accepted at the 47th European Conference on Information Retrieval (ECIR 2025). Paper soon! 📄

Model

The model is available for use in Hugging Face

Contents 📚

data/ - Directory containing the datasets used in the research.
distillation/ - Directory with the code for the model distillation process.
prompts/ - Directory holding the prompts utilized for training and evaluation.

Data Access 🔍

In the data/ directory, you will find the IDs and labels used in our research. For those interested in the complete dataset, it can be accessed here. Upon request, we can also provide additional data and the explanations generated by the models. Please reach out for more information.

Citation 📑

If you use this model or any part of the code included in this repository, please cite the following reference:

@misc{piot2024efficientexplainablehatespeech,
      title={Towards Efficient and Explainable Hate Speech Detection via Model Distillation}, 
      author={Paloma Piot and Javier Parapar},
      year={2024},
      eprint={2412.13698},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13698}, 
}

If you use MetaHate dataset, please cite the following reference:

@article{Piot_Martín-Rodilla_Parapar_2024,
  title={MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection},
  volume={18},
  url={https://ojs.aaai.org/index.php/ICWSM/article/view/31445},
  DOI={10.1609/icwsm.v18i1.31445},
  number={1},
  journal={Proceedings of the International AAAI Conference on Web and Social Media},
  author={Piot, Paloma and Martín-Rodilla, Patricia and Parapar, Javier},
  year={2024},
  month={May},
  pages={2025-2039}
}

Disclaimer ⚠️

This repository includes content that may contain hate speech, offensive language, or other forms of inappropriate and objectionable material. The content present in the dataset or code is not created or endorsed by the authors or contributors of this project. It is collected from various sources and does not necessarily reflect the views or opinions of the project maintainers. The purpose of using this repository is for research, analysis, or educational purposes only. The authors do not endorse or promote any harmful, discriminatory, or offensive behavior conveyed in the dataset.

Users are advised to exercise caution and sensitivity when interacting with or interpreting the repository. If you choose to use the datasets or models, it is recommended to handle the content responsibly and in compliance with ethical guidelines and applicable laws. The project maintainers disclaim any responsibility for the content within the repository and cannot be held liable for how it is used or interpreted by others.

Acknowledgements 🙏

The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 101073351. The authors also thank the financial support supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditation 2019-2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruña as a Research Center of the Galician University System and the project PID2022-137061OB-C21 (Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Proyectos de Generación de Conocimiento; supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next Generation EU).

License 📜

This project is licensed under the META LLAMA 3 COMMUNITY LICENSE AGREEMENT. Please see the LICENSE file for details.

Contact 📬

For further questions, inquiries, or discussions related to this project, please feel free to reach out via email:

Email: paloma.piot@udc.es

If you encounter any issues or have specific questions about the code, we recommend opening an issue on GitHub for better visibility and collaboration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Towards Efficient and Explainable Hate Speech Detection via Model Distillation

Overview 🌟

Model

Contents 📚

Data Access 🔍

Citation 📑

Disclaimer ⚠️

Acknowledgements 🙏

License 📜

Contact 📬

Files

README.md

Latest commit

History

README.md

File metadata and controls

Towards Efficient and Explainable Hate Speech Detection via Model Distillation

Overview 🌟

Model

Contents 📚

Data Access 🔍

Citation 📑

Disclaimer ⚠️

Acknowledgements 🙏

License 📜

Contact 📬