Saturday 29 March 2025
The quest for transparency in propaganda detection has taken a significant step forward with the introduction of a new multilingual dataset and fine-tuned language model designed specifically for this task. The dataset, which combines news articles and tweets in both Arabic and English, provides a much-needed resource for researchers seeking to develop more effective methods for identifying and explaining propaganda.
Propaganda detection has become increasingly important in today’s digital landscape, where misinformation and disinformation can spread quickly online. However, detecting propaganda is a complex task that requires not only technical expertise but also linguistic and cultural understanding. The new dataset aims to address this challenge by providing a large-scale collection of labeled examples of propagandistic text in both Arabic and English.
The dataset consists of approximately 21,000 news paragraphs and tweets in Arabic, and around 6,000 news articles in English, all annotated with labels indicating whether the content is propagandistic or not. To enhance the quality of the annotations, a team of human evaluators manually reviewed each example to ensure that the label accurately reflects the content.
In addition to the dataset, the researchers have also developed a fine-tuned language model designed specifically for propaganda detection and explanation generation. The model, which is based on the popular Llama 3.1 8B Instruct architecture, has been trained on a large corpus of text data and can generate explanations for why a piece of text is propagandistic.
The researchers’ approach to propaganda detection involves using natural language processing techniques to analyze the content of the text and identify features that are characteristic of propagandistic writing. The model is then trained to predict whether a given piece of text is propagandistic or not, based on these features.
One of the key innovations of this work is the use of explanations for propaganda detection. By generating explanations for why a piece of text is propagandistic, the model provides more transparency and accountability in its decision-making process. This can be particularly important in cases where the model’s predictions may be disputed or challenged by human evaluators.
The researchers’ approach to propaganda detection has several potential applications in the field of natural language processing. For example, it could be used to develop more effective methods for identifying and mitigating the spread of misinformation online. It could also be used to improve the accuracy of automated propaganda detection systems, which are increasingly being used by governments and other organizations to monitor and counter disinformation.
Cite this article: “Advancing Propaganda Detection with Multilingual Dataset and Fine-Tuned Language Model”, The Science Archive, 2025.
Propaganda, Detection, Language Model, Multilingual, Dataset, Arabic, English, Misinformation, Disinformation, Natural Language Processing







