Shared Task

Description and Task Definition

Persuasion techniques are psychological instruments people use to influence others’ opinions and actions. Some of such techniques use invalid or otherwise faulty reasoning in the construction of an argument, while others intentionally appeal to emotions to cause the recipient of the information to experience certain feelings, e.g., fear, in order to win an argument, especially in the absence of factual evidence.

The task focuses on the detection and classification of Persuasion Techniques in 5 Slavic languages: Bulgarian, Polish, Croatian, Slovene and Russian, in two types of texts: (a) parliamentary debates on highly-debated topics, and (b) social media posts related to the spread of disinformation. The task consists of two subtasks:

Subtask 1: (detection) Given a text and a list of text fragment offsets, determine for each corresponding text fragment whether it contains one or more persuasion techniques in a given taxonomy of persuasion techniques,
Subtask 2: (classification) Given a text and a list of text fragment offsets, determine for each corresponding text fragment which persuasion technique are present therein.

Subtask 1 is a binary classification task, whereas Subtask 2 is a multi-class multi-label classification task. The text fragments correspond to paragraphs.

In this task we exploit an extended version of the persuasion technique taxonomy from the SemEval 2023 Task 10, described in (Piskorski et al., 2023), which is extended by two new persuasion techniques, namely false equivalence, and appeal to pity. The persuasion technique taxonomy, accompanied with the definitions and some examples is available HERE, whereas the list of labels is available HERE.

Participation

Participants may join both subtasks or only one of them. Analogously, it is not mandatory to submit responses for all languages. Up to max. 5 system responses per language are allowed. The comparison of participant systems will be provided on the web page of the shared task after the test phase ends.

Teams that intend to participate should register by sending an email to: bsnlp@cs.helsinki.fi, which includes the following information:

name of team,
names of team members,
contact person,
contact email.

Data

Trial data

The TRIAL DATA contains some annotated documents and approx. 10-15 additional raw documents.

Training data

Participants may exploit as training data the wide range of existing multilingual datasets with text span-level and paragraph-level annotated persuasion techniques, which were used in the context of recent SemEval and CLEF challenges, and which cover, among others, three Slavic languages: Bulgarian, Polish and Russian. In particular one may consider using resources from:

SemEval 2020 Task 11 described in (Da San Martino et al., 2020)
SemEval 2021 Task 6 described in (Dimitrov et al., 2021)
SemEval 2023 Task 10 (incl.: PL, RU), described in (Piskorski et al., 2023)
CLEF 2024 Task 3 (incl.: BG, PL, RU), described in (Piskorski et al., 2024)
SemEval 2024 Task 4 (incl.: BG), described in (Dimitrov et al., 2024)

Accessing the data from the aforementioned shared tasks might require registering to the these tasks. The list of relevant resources above is not exhaustive.

Additional TRAINING DATA in the domain of parliamentary debates in Bulgarian, Polish, Slovene, and social media in Russian (approx 80 documents) is now available.

In addition, one can consider exploiting the ParlaMint dataset for domain adaptation, and the multilingual parliamentary model XLM-R-parla pre-trained on texts of parliamentary proceedings.

Test Data

The Test dataset will consist of documents covering for some languages fragments of parliamentary debates and social media messages for others.

Tools

Format conformity and evaluation scripts can be found in this REPOSITORY

Evaluation

Evaluation is carried out on the system response returned by the participants for the test corpora. The following metrics will be used as official metrics for ranking participant system responses

Subtask 1: (detection): F1
Subtask 2: (classification): macro and micro F1

In addition to the the official evaluation metrics, other metrics will be computed too: (a) Subtask 1: F1, precision, recall and (b) Subtask 2: weighted micro and macro F1. Furthermore, for Subtask 2, an additional evaluation for the detection of persuasion techniques at coarse-grained level will be provided.

Formats

Source Documents

The files containing the source documents will use UTF-8 encoding and will have a name starting with 2-letter encoding of the language (capitalized) and followed by an underscore and a unique identifier.

Submission Files

For Subtask 1 the submission file contain lines, where each line consists of four tab-delimited elements:

documentID start end persuasion_flag

where persuasion_flag indicates whether the text fragment starting at start and ending at end character position in the document documentID contains at least one persuasion technique.

For Subtask 2 the submission file contain lines, where each line consists of three or more tab-delimited elements in the following format:

documentID start end pt1 .... ptN

where pt1, ... ptN is a list of N labels (might be empty) corresponding to persuasion techniques present in the text fragment starting at start and ending at end character position in the document documentID.

Publication and Workshop

Participants in the shared task are invited to submit a paper to the SlavNLP 2025 workshop. Submitting a paper is not mandatory for participating in the Shared Task. Papers must follow the workshop submission instructions and will undergo regular peer review. Their acceptance will not depend on the results obtained in the shared task, but on the quality of the paper: clarity of presentation, etc. Authors of accepted papers will be informed about the evaluation results of their systems prior to the submission deadline. Accepted papers will be published in the Proceedings of Slavic NLP 2025 and will appear in the ACL anthology. Accepted papers will be presented at a session of the Slavic NLP 2025 Workshop specially dedicated to the Shared Task.

Important Dates

See here

Organization

The following people are contributing to the organization of this Shared Task (not exhaustive)

Jakub Piskorski (Institute of Computer Science, Polish Academy of Science, Warsaw, Poland)
Dimitar Dimitrov (Sofia University, Bulgaria)
Filip Dobranić (Institute for Contemporary History, Ljubljana, Slovenia)
Marina Ernst (University of Koblenz, Germany)
Jacek Haneczok (Erste Group Bank AG, Vienna, Austria)
Nikola Ljubešić (Jožef Stefan Institute, Ljubljana, Slovenia)
Michał Marcińczuk (Samurai Labs, Poland)
Arkadiusz Modzelewski (Polish-Japanese Academy of Information Technology, Warsaw, Poland, and University of Padova, Italy)
Ivo Moravski (Sofia University, Bulgaria)
Roman Yangarber (University of Helsinki, Finland)