Advancing Propaganda Detection with Multilingual Dataset and Fine-Tuned Language Model

Saturday 29 March 2025

The quest for transparency in propaganda detection has taken a significant step forward with the introduction of a new multilingual dataset and fine-tuned language model designed specifically for this task. The dataset, which combines news articles and tweets in both Arabic and English, provides a much-needed resource for researchers seeking to develop more effective methods for identifying and explaining propaganda.

Propaganda detection has become increasingly important in today’s digital landscape, where misinformation and disinformation can spread quickly online. However, detecting propaganda is a complex task that requires not only technical expertise but also linguistic and cultural understanding. The new dataset aims to address this challenge by providing a large-scale collection of labeled examples of propagandistic text in both Arabic and English.

The dataset consists of approximately 21,000 news paragraphs and tweets in Arabic, and around 6,000 news articles in English, all annotated with labels indicating whether the content is propagandistic or not. To enhance the quality of the annotations, a team of human evaluators manually reviewed each example to ensure that the label accurately reflects the content.

In addition to the dataset, the researchers have also developed a fine-tuned language model designed specifically for propaganda detection and explanation generation. The model, which is based on the popular Llama 3.1 8B Instruct architecture, has been trained on a large corpus of text data and can generate explanations for why a piece of text is propagandistic.

The researchers’ approach to propaganda detection involves using natural language processing techniques to analyze the content of the text and identify features that are characteristic of propagandistic writing. The model is then trained to predict whether a given piece of text is propagandistic or not, based on these features.

One of the key innovations of this work is the use of explanations for propaganda detection. By generating explanations for why a piece of text is propagandistic, the model provides more transparency and accountability in its decision-making process. This can be particularly important in cases where the model’s predictions may be disputed or challenged by human evaluators.

The researchers’ approach to propaganda detection has several potential applications in the field of natural language processing. For example, it could be used to develop more effective methods for identifying and mitigating the spread of misinformation online. It could also be used to improve the accuracy of automated propaganda detection systems, which are increasingly being used by governments and other organizations to monitor and counter disinformation.

Cite this article: “Advancing Propaganda Detection with Multilingual Dataset and Fine-Tuned Language Model”, The Science Archive, 2025.

Propaganda, Detection, Language Model, Multilingual, Dataset, Arabic, English, Misinformation, Disinformation, Natural Language Processing

Reference: Maram Hasanain, Md Arid Hasan, Mohamed Bayan Kmainasi, Elisa Sartori, Ali Ezzat Shahroor, Giovanni Da San Martino, Firoj Alam, “Reasoning About Persuasion: Can LLMs Enable Explainable Propaganda Detection?” (2025).

Leave a ReplyCancel Reply

Related Posts

Neural USD: A Novel Approach to Object-Centric Image Editing

Integrating Information Extraction with Target Databases for Efficient Data Analysis

Breaking Barriers in Distributed Graph Algorithms: A New Algorithm for Efficiently Coloring Graphs with Bounded Neighborhood Independence

Realistic Urban Traffic Simulation for Autonomous Vehicles

Unraveling Chaos: A New Approach to Forecasting Complex Systems

ArtiLatent: A Breakthrough Framework for Realistic 3D Object Generation from Single Images