Skip to content

palomapiot/metahate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection

This work was accepted at the 18th International AAAI Conference on Web and Social Media (ICWSM 2024). The paper is available here.

Citation

If you use this dataset or any part of the code included in this repository, please cite the following reference:

@article{Piot_Martín-Rodilla_Parapar_2024,
  title={MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection},
  volume={18},
  url={https://ojs.aaai.org/index.php/ICWSM/article/view/31445},
  DOI={10.1609/icwsm.v18i1.31445},
  number={1},
  journal={Proceedings of the International AAAI Conference on Web and Social Media},
  author={Piot, Paloma and Martín-Rodilla, Patricia and Parapar, Javier},
  year={2024},
  month={May},
  pages={2025-2039}
}

Disclaimer

This dataset includes content that may contain hate speech, offensive language, or other forms of inappropriate and objectionable material. The content present in the dataset is not created or endorsed by the authors or contributors of this project. It is collected from various sources and does not necessarily reflect the views or opinions of the project maintainers.

The purpose of using this dataset is for research, analysis, or educational purposes only. The authors do not endorse or promote any harmful, discriminatory, or offensive behaviour conveyed in the dataset.

Users are advised to exercise caution and sensitivity when interacting with or interpreting the dataset. If you choose to use the dataset, it is recommended to handle the content responsibly and in compliance with ethical guidelines and applicable laws.

The project maintainers disclaim any responsibility for the content within the dataset and cannot be held liable for how it is used or interpreted by others.

Overview

Details of MetaHate and how to get the data are explained here: https://irlab.org/metahate.html

Table of Contents

Usage

Each directory in this repository contains Jupyter Notebooks with detailed explanations and instructions on how to use the associated code. Below is a brief overview of the contents of each directory:

  • /analysis/: Contains the lexical and psycholinguistic analysis done
  • /baselines/: Contains the different baselines run
  • /data/: Contains the MetaHate dataset (only the sample is publicly available).
  • DATASHEET.md: Datasheet describing the dataset
  • LICENSE: License for the usage of the dataset and code

Navigate to the specific directory of interest, and you will find Jupyter Notebooks with step-by-step explanations, code snippets, and documentation to guide you through the usage of the code. Execute the notebooks in a Jupyter environment to interact with and explore the provided code.

Please refer to the individual notebooks for more detailed instructions and information related to each specific functionality or analysis.

Documentation

The documentation for this project is provided within the Jupyter Notebooks in each respective directory. These notebooks contain detailed explanations, comments, and documentation for the code, analyses, and methodologies used.

  • Navigate to the specific directory of interest to find the corresponding Jupyter Notebook.
  • Each notebook is annotated with explanations, code snippets, and inline comments to guide you through the content.
  • Execute the notebooks in a Jupyter environment to interact with and explore the provided code.

If you have any specific questions or require additional clarification, please refer to the "Contact" section to reach out via email or open an issue on GitHub.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

The Apache License 2.0 is an open-source license that allows you to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software under the terms of the license.

For more details, please refer to the Apache License 2.0.

Resources

Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kg CO2 eq / kWh. A cumulative of 98 hours of computation was performed on hardware of type RTX A6000 (TDP of 300W).

Total emissions are estimated to be 12.7 kg CO2 eq of which 0 percents were directly offset.

Estimations were conducted using the MachineLearning Impact calculator presented in Lacoste et al. 2019.

Acknowledgements

The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 101073351. The authors also thank the financial support supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditation 2019-2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruña as a Research Center of the Galician University System and the project PID2022-137061OB-C21 (Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Proyectos de Generación de Conocimiento; supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next Generation EU).

How to Contribute

We welcome contributions from the community to improve and enhance this project. If you have questions, feedback, or suggestions, please consider opening an issue on GitHub. Follow the steps below to contribute:

  1. Open an Issue:

    • If you have a question, or suggestion, or encounter any issues, open a new GitHub issue.
    • Clearly describe the nature of the issue or your suggestion.
    • Provide any relevant context, such as code snippets, error messages, or screenshots.
  2. Wait for Response:

    • Once you've opened an issue, the project maintainers will review and respond to your inquiry.
    • Feel free to participate in the discussion on the issue thread.
  3. Collaborate and Contribute:

    • If you're interested in contributing code, bug fixes, or new features, please check for open issues labelled as "help wanted" or "good first issue."
    • Discuss your intention to contribute to the related GitHub issue.
    • Fork the repository, make your changes, and submit a pull request.

By contributing to this project, you agree to abide by the Code of Conduct. Thank you for helping make this project better!

Contact

For further questions, inquiries, or discussions related to this project, please feel free to reach out via email:

We appreciate your interest and are happy to assist with any additional information or clarification you may need. Please allow some time for a response, and we'll get back to you as soon as possible.

If you encounter any issues or have specific questions about the code, we recommend opening an issue on GitHub for better visibility and collaboration.

Thank you for your interest and involvement in this project!

About

MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published