Name	Name	Last commit message	Last commit date
parent directory ..
baselines	baselines
format_checker	format_checker
scorer	scorer
src	src
README.md	README.md
__init__.py	__init__.py

Task 1: Unimodal (Text) Propagandistic Technique Detection

The aim of this task is to identify the propagandistic content from multigenre (tweet and news paragraphs from news articles) text. Please see the Task Description below.

Please follow the website https://araieval.gitlab.io/ for any latest information.

Table of contents:

List of Versions
Contents of the Directory
Task Description
Dataset
Scorer and Official Evaluation Metrics
Baselines
Format checker
Submission
Licensing
Credits

List of Versions

[14/03/2024] Training and dev data released for both subtasks

Contents of the Directory

Main folder: data
This directory contains files for the subtasks.
Main folder: baselines
Contains scripts provided for baseline models of the tasks.
Main folder: format_checker
Contains scripts provided to check format of the submission file.
Main folder: scorer
Contains scripts provided to score output of the model when provided with label (i.e., dev).
README.md
This file!

Task Description

Given a multigenre text snippet (a news paragraph or a tweet), the task is to detect the propaganda techniques used in the text together with the exact span(s) in which each propaganda technique appears. This a sequence tagging task.

Dataset

All datasets are JSONL files where each line has one JSON object. The text encoding is UTF-8. The input and the result files have the same format for all the subtasks.

Input data format

Each object has the following format:

{
  id -> identifier of the example,
  text -> text,
  labels -> list of json object contains techniques and other information,
  type -> type of text: tweet or news paragraph
}

Example

    {
        "id": "7365",
        "text": "تحذيرات من حرب جديدة في حال فشل الانتخابات القادمة",
        "labels": [
			{
			  "start": 0,
			  "end": 50,
			  "technique": "Appeal_to_Fear-Prejudice",
			  "text": "تحذيرات من حرب جديدة في حال فشل الانتخابات القادمة"
			},
			{
			  "start": 11,
			  "end": 14,
			  "technique": "Loaded_Language",
			  "text": "حرب"
			}
		],
	  "type": "tweet"
	}

Output Data Format

Each object has the following format:

{
  id -> identifier of the example,
  labels -> list of json objects containing techniques and spans, or [] if no technique is predicted

Example

    {
        "id": "7365",
        "labels": [
			{
			  "start": 0,
			  "end": 50,
			  "technique": "Appeal_to_Fear-Prejudice",
			  "text": "تحذيرات من حرب جديدة في حال فشل الانتخابات القادمة"
			},
			{
			  "start": 11,
			  "end": 14,
			  "technique": "Loaded_Language",
			  "text": "حرب"
			}
		]
	}

Scorer and Official Evaluation Metrics

Scorers

The scorer for the subtasks is located in the scorer module of the project. The scorer will report official evaluation metric and other metrics of a prediction file. The scorer invokes the format checker for the task to verify the output is properly shaped.

You can install all prerequisites through,

pip install -r requirements.txt

Launch the scorer for the subtask as follows:

python scorer/task1.py --gold-file=<path_gold_file> --predicted-file=<predictions_file>

Example

python scorer/task1.py --predicted_file task1_dev_output.jsonl --gold_file data/task1_dev.jsonl

Official Evaluation Metrics

The official evaluation metric for the task is modified micro-F1. However, the scorer also reports macro-F1, Precision, and Recall.

Baselines

The same script can be used for both subtasks.

Random baseline

python baselines/task1.py --dev_file_path data/araieval24_task1_dev.jsonl --output_file_path task1_random_baseline.jsonl

If you submit the predictions of the baseline on the development set to the shared task website, you would get a modfied micro-F1 score of 0.0260.

Format checker

The format checker for both subtsaks is located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format.

Before running the format checker please install all prerequisites through,

pip install -r requirements.txt

To launch it, please run the following command:

python format_checker/task1.py -p paths_to_your_results_files -g paths_to_your_gold_file

Example:

python format_checker/task1.py -p task1_random_baseline.jsonl -g data/araieval24_task1_dev.jsonl

paths_to_your_results_files: can be one path or space seperated list of paths

Submission

Guidelines

The process consists of two phases:

System Development Phase: This phase involves working on the dev set.
Final Evaluation Phase (will start on 27 April 2023): This phase involves working on the test set, which will be released during the evaluation cycle.

For each phase, please adhere to the following guidelines:

Each team should create and maintain a single account for submissions. Please ensure all runs are submitted through this account. Submissions from multiple accounts by the same team could result in your system being not ranked in the overview paper.
The most recent file submitted to the leaderboard will be considered your final submission.
The output file must be named task1_any_suffix.jsonl (for example, task1_team_name.jsonl). Failure to follow this naming convention will result in an error on the leaderboard.
You are required to compress the .jsonl file into a .zip file (for example, zip task1.zip task1.jsonl) and submit it via the Codalab page.
Please include your team name and a description of your method with each submission.
You are permitted to submit a maximum of 200 submissions per day for each subtask.

Submission Site

Please submit your results on the respective subtask tab: https://codalab.lisn.upsaclay.fr/competitions/18111

Licensing

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This allows for sharing and adapting the work, provided that attribution is given, the use is non-commercial, and any derivative works are shared under the same terms. For more information, please visit https://creativecommons.org/licenses/by-nc-sa/4.0/.

Credits

Please find it on the task website: https://araieval.gitlab.io/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task1

task1

README.md

Task 1: Unimodal (Text) Propagandistic Technique Detection

List of Versions

Contents of the Directory

Task Description

Dataset

Input data format

Example

Output Data Format

Example

Scorer and Official Evaluation Metrics

Scorers

Example

Official Evaluation Metrics

Baselines

Random baseline

Format checker

Example:

Submission

Guidelines

Submission Site

Licensing

Credits

Files

task1

Directory actions

More options

Directory actions

More options

Latest commit

History

task1

Folders and files

parent directory

README.md

Task 1: Unimodal (Text) Propagandistic Technique Detection

List of Versions

Contents of the Directory

Task Description

Dataset

Input data format

Example

Output Data Format

Example

Scorer and Official Evaluation Metrics

Scorers

Example

Official Evaluation Metrics

Baselines

Random baseline

Format checker

Example:

Submission

Guidelines

Submission Site

Licensing

Credits