Cross-Lingual Abusive Language Detection using Pre-Trained Audio Models

Saturday 01 February 2025


The quest for a more efficient and accurate way to detect abusive language in audio clips has led researchers to explore the realm of few-shot cross-lingual learning. A recent study delves into this topic, leveraging pre-trained audio models like Wav2Vec and Whisper to develop a system that can identify abusive content in multiple languages with minimal training data.


The authors employed the ADIMA dataset, which consists of 11,775 audio clips in 10 Indian languages, annotated for binary abuse detection. They utilized the Model-Agnostic Meta-Learning (MAML) framework, a popular approach for few-shot learning, to adapt pre-trained models to new languages and detect abusive speech.


The study focused on two normalization techniques: temporal mean and L2-norm, which are commonly used in audio processing tasks. The results showed that the L2-norm outperformed the temporal mean in most cases, indicating its effectiveness in extracting relevant features from the audio data.


One of the key findings was the impressive performance of the Whisper model, which achieved an aggregate macro F1 score of 83.48% across all languages. This is a significant improvement over the baseline ADIMA scores, which ranged from 75.27% to 84.69%. The Wav2Vec model also demonstrated strong results, with an aggregate macro F1 score of 78.47%.


The study’s findings have implications for real-world applications, particularly in regions where language barriers can hinder effective communication and lead to misunderstandings. By developing a system that can accurately detect abusive language across multiple languages, the researchers aim to improve online safety and reduce the spread of harmful content.


In addition to its practical significance, this research contributes to the broader field of few-shot learning, demonstrating the potential of pre-trained models in adapting to new tasks with minimal data. As the demand for efficient and accurate natural language processing continues to grow, studies like this will play a crucial role in advancing our understanding of language and improving our ability to work with it.


The authors’ approach also highlights the importance of exploring different normalization techniques in audio processing. By comparing the performance of temporal mean and L2-norm, they provide valuable insights into the strengths and weaknesses of each method, which can inform future research in this area.


Overall, this study represents an important step forward in developing a robust system for detecting abusive language across multiple languages.


Cite this article: “Cross-Lingual Abusive Language Detection using Pre-Trained Audio Models”, The Science Archive, 2025.


Few-Shot Learning, Cross-Lingual, Abusive Language Detection, Audio Models, Wav2Vec, Whisper, Adima Dataset, Maml Framework, Normalization Techniques, Temporal Mean, L2-Norm


Reference: Aditya Narayan Sankaran, Reza Farahbakhsh, Noel Crespi, “Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning” (2024).


Leave a Reply