Poisoning Multimodal Retrieval-Augmented Generation Models with Clean-Label Attacks: A New Threat to Vision-Language Systems

Tuesday 08 April 2025


As we increasingly rely on large language models (LLMs) to perform tasks such as generating text, answering questions, and even creating art, a new threat has emerged: the ability to poison these models through clever manipulation of their training data.


Researchers have developed a technique called Poisoned- MRAG, which allows attackers to inject malicious image-text pairs into multimodal retrieval-augmented generation (RAG) systems. These systems use LLMs to retrieve relevant information from vast databases and generate text based on that information. The attacker’s goal is to trick the model into producing an answer that is either incorrect or biased towards their desired outcome.


The attack works by injecting malicious pairs of images and corresponding texts into the training data. This can be done in a way that makes it difficult for the model to distinguish between legitimate and manipulated data. Once the model has been trained on this poisoned data, it will produce results that are influenced by the attacker’s goals.


To make matters worse, the attack is particularly effective against LLMs that use multimodal RAG, which combines text and image data to generate answers. This type of system is commonly used in applications such as chatbots, virtual assistants, and even autonomous vehicles.


The researchers tested their attack on several popular LLMs, including Claude-3-haiku and InfoSeek. They found that even with a small number of malicious pairs injected into the training data, they were able to achieve an impressive 98% success rate in manipulating the model’s output.


But why is this such a big deal? Well, for one thing, it highlights the vulnerability of LLMs to manipulation by attackers. If these models are used in critical applications such as autonomous vehicles or medical diagnosis, the consequences of a successful attack could be severe.


Furthermore, the fact that this attack is so effective suggests that even the most advanced LLMs may not be immune to manipulation. This has significant implications for the development and deployment of AI systems in general.


So what can be done to prevent these attacks? The researchers suggest several possible defenses, including using more robust training data, implementing stricter quality control measures, and developing more sophisticated algorithms to detect and mitigate the effects of poisoning.


Ultimately, the discovery of Poisoned-MRAG highlights the need for greater vigilance and attention to security in the development and deployment of AI systems. As these models become increasingly ubiquitous, it is essential that we take steps to ensure their integrity and prevent manipulation by malicious actors.


Cite this article: “Poisoning Multimodal Retrieval-Augmented Generation Models with Clean-Label Attacks: A New Threat to Vision-Language Systems”, The Science Archive, 2025.


Large Language Models, Poisoned-Mrag, Multimodal Retrieval-Augmented Generation, Attack, Malicious Data, Training Data, Manipulation, Vulnerability, Security, Ai Systems, Machine Learning


Reference: Yinuo Liu, Zenghui Yuan, Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong, “Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation” (2025).


Leave a Reply