Robust Training with Expert Routing: A Novel Approach to Adversarial Attacks on Deep Learning Models

Sunday 23 March 2025


Deep learning models have made tremendous strides in recent years, but they’re not invincible. Adversarial attacks, specifically designed to trick these models into making incorrect predictions, have become a major concern for AI researchers and developers. To combat this issue, a team of scientists has proposed a novel approach that combines the strengths of two existing methods: mixture of experts (MoE) and adversarial training.


The MoE architecture is particularly well-suited for complex tasks like image classification, as it allows multiple experts to specialize in different aspects of the problem. However, this complexity also makes it more vulnerable to adversarial attacks. Adversarial training, on the other hand, involves training a model using distorted versions of its input data, which can help it learn to recognize and reject these attacks.


The proposed method, dubbed RT-ER (Robust Training with Expert Routing), combines the benefits of both approaches. It starts by defining a loss function that encourages the MoE model to produce accurate predictions on both clean and adversarial examples. The key innovation is the use of an expert routing mechanism, which allows the model to dynamically adjust its reliance on each expert based on the input data.


In other words, RT-ER enables the MoE model to adaptively select the most relevant experts for a given input, rather than relying solely on a single expert or fixed combination of experts. This adaptability is crucial in the face of adversarial attacks, as it allows the model to recognize and reject distorted inputs more effectively.


The researchers evaluated RT-ER using several state-of-the-art benchmarks, including CIFAR-10 and TinyImageNet. The results showed that RT-ER significantly outperformed traditional MoE models on both datasets, achieving higher accuracy and robustness under adversarial attacks.


One of the most impressive aspects of RT-ER is its ability to improve the model’s performance without sacrificing its original capabilities. In other words, RT-ER doesn’t require the model to sacrifice its ability to accurately classify clean data in order to become more robust against attacks.


The team also explored the impact of varying the number of experts and the trade-off between standard accuracy and robustness. The results demonstrated that increasing the number of experts can improve the model’s overall performance, but at the cost of increased computational complexity.


Moreover, the researchers introduced a novel joint training strategy for dual-models, which combines a standard MoE with a robust MoE trained using RT-ER.


Cite this article: “Robust Training with Expert Routing: A Novel Approach to Adversarial Attacks on Deep Learning Models”, The Science Archive, 2025.


Adversarial Attacks, Mixture Of Experts, Deep Learning, Robustness, Adversarial Training, Expert Routing, Neural Networks, Image Classification, Natural Language Processing, Machine Learning


Reference: Xu Zhang, Kaidi Xu, Ziqing Hu, Ren Wang, “Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach” (2025).


Leave a Reply