AI Breakthrough: GPT-4 Outperforms Others in Detecting Hate Speech Online

Friday 28 February 2025


For years, researchers have been trying to develop a foolproof way to detect hate speech online. The problem is that computers are still terrible at understanding human emotions and context, making it difficult for them to accurately identify harmful comments. But now, a new study suggests that a special kind of artificial intelligence called GPT-4 may be the solution we’ve been waiting for.


The study used a dataset of over 1,500 annotated text samples from German online newspapers to test the performance of three different AI models: GPT-4, OpenAI’s Moderation API, and Jigsaw’s Perspective API. The results were impressive – GPT-4 outperformed the other two models in detecting hate speech, with an accuracy rate of 87.7%.


But what makes GPT-4 so special? Unlike traditional AI models that rely on rule-based systems or machine learning algorithms, GPT-4 uses a type of AI called a transformer. This allows it to learn from large amounts of data and generate text that is similar in style and tone to the original text.


In this study, the researchers used three different approaches with GPT-4: zero-shot learning, one-shot learning, and few-shot learning. Zero-shot learning involves training the AI on a dataset without any explicit labels or annotations. One-shot learning involves training the AI on a single example of hate speech, while few-shot learning involves training it on just a few examples.


The results showed that GPT-4 performed best when using the one-shot learning approach, with an accuracy rate of 87.7%. This suggests that the AI is able to learn quickly and accurately from just one example of hate speech.


But what about the other two models? The Moderation API, which uses a machine learning algorithm to detect hate speech, performed poorly in this study, with an accuracy rate of only 81.3%. The Perspective API, which uses a combination of machine learning and rule-based systems, did slightly better, but still struggled to accurately identify hate speech.


The researchers also reannotated the dataset after the initial results were published, which led to some surprising changes in the performance of the models. After reannotation, GPT-4’s accuracy rate increased even further, to 92.3%. This suggests that the AI is able to learn from its mistakes and improve over time.


Overall, this study suggests that GPT-4 may be a game-changer in the fight against online hate speech.


Cite this article: “AI Breakthrough: GPT-4 Outperforms Others in Detecting Hate Speech Online”, The Science Archive, 2025.


Gpt-4, Artificial Intelligence, Hate Speech, Online Newspapers, German, Accuracy Rate, Transformer, Machine Learning Algorithms, Rule-Based Systems, Reannotation


Reference: Manuel Weber, Moritz Huber, Maximilian Auch, Alexander Döschl, Max-Emanuel Keller, Peter Mandl, “Digital Guardians: Can GPT-4, Perspective API, and Moderation API reliably detect hate speech in reader comments of German online newspapers?” (2025).


Leave a Reply