Combining Pre-Trained Language Models to Detect AI-Generated Content Across Languages

Wednesday 22 January 2025


The quest for a more accurate AI-generated content detector has taken a significant step forward, as researchers have developed an ensemble approach that combines multiple pre-trained language models to detect AI-generated text across English and multilingual datasets.


The team’s system, which was presented at the COLING 2025 Workshop on Detecting AI-Generated Content, employed a weighted soft-voting strategy to combine predictions from three different models for each subtask. The weights were determined based on inverse perplexity, with lower perplexity values reflecting higher confidence.


The researchers used six Transformer-based models, including RoBERTa-base, OpenAI’s AI text detector, and BERT-base-cased for English, as well as RemBERT, XLM-RoBERTa-base, and BERT-base-multilingual-cased for multilingual tasks. They found that combining these models effectively enhanced both language-specific and cross-lingual accuracy.


One of the key challenges faced by the team was data imbalance between well-represented languages like English and Chinese, and underrepresented ones such as Urdu, Arabic, and Russian. To address this issue, they scaled down samples from overrepresented languages to balance the dataset, which improved performance across languages.


The results were impressive, with the ensemble achieving a Macro F1-score of 0.7458 for English and 0.7513 for multilingual tasks. These scores outperformed individual models and baselines, demonstrating the effectiveness of combining diverse models to achieve better performance.


The team’s approach also highlights the importance of addressing data imbalance in machine learning tasks. By scaling down samples from overrepresented languages, they were able to improve generalization across languages and reduce biases in predictions.


While there is still much work to be done in developing more accurate AI-generated content detectors, this research represents an important step forward in the quest for better detection methods. As the field continues to evolve, it will be interesting to see how these approaches are refined and adapted to tackle new challenges in natural language processing.


The researchers used a range of tools and libraries, including Python 3.10.14, Pandas 2.2.2, NumPy 1.26.4, PyTorch 2.4.0, Transformers 4.44.2, and Evaluate 0.4.3. They also employed a weighted soft-voting strategy to combine predictions from multiple models.


Cite this article: “Combining Pre-Trained Language Models to Detect AI-Generated Content Across Languages”, The Science Archive, 2025.


Ai-Generated Content, Ensemble Approach, Language Models, Transformer-Based Models, Roberta, Bert, Openai, Data Imbalance, Machine Learning, Natural Language Processing, Weighted Soft-Voting Strategy


Reference: Md Kamrujjaman Mobin, Md Saiful Islam, “LuxVeri at GenAI Detection Task 1: Inverse Perplexity Weighted Ensemble for Robust Detection of AI-Generated Text across English and Multilingual Contexts” (2025).


Leave a Reply