Friday 31 January 2025
A novel approach has been developed for enhancing the robustness of prompt-based learning in vision-language models. The technique, dubbed NLPrompt, leverages optimal transport theory to categorize data into clean and noisy subsets, allowing for tailored loss functions to be applied to each group.
In traditional prompt-based learning, a single loss function is used across all data points, regardless of their quality. This can lead to suboptimal performance when the dataset contains noisy labels or outliers. NLPrompt addresses this issue by partitioning the data into two groups: clean and noisy. The clean subset includes high-quality labeled examples, while the noisy subset comprises low-quality or mislabeled data.
To categorize the data, NLPrompt employs optimal transport theory to compute a similarity matrix between text and image features. This matrix is then used to assign each sample to either the clean or noisy subset based on its similarity to the class prompts. The partitioned data is then fine-tuned using different loss functions: cross-entropy (CE) for the clean subset and mean absolute error (MAE) for the noisy subset.
The use of MAE for the noisy subset is particularly noteworthy, as it allows the model to learn from incorrect labels by focusing on the magnitude of the errors rather than their direction. This approach can be more robust in the presence of noise, as it does not penalize the model for making mistakes in the same way that CE does.
Experimental results demonstrate that NLPrompt outperforms traditional prompt-based learning methods across a range of datasets and noise levels. The technique is particularly effective when dealing with noisy labels or outliers, where its ability to adapt loss functions to each subset leads to improved performance.
Theoretical analysis provides further insight into the benefits of NLPrompt. By examining the dynamics of feature learning during training, researchers found that the MAE loss function promotes a stronger separation between task-relevant and task-irrelevant features compared to CE. This increased separation can lead to better generalization performance in the presence of noise.
The development of NLPrompt has significant implications for the field of vision-language research, where robustness is increasingly important due to the prevalence of noisy labels or outliers in real-world datasets. By leveraging optimal transport theory and adapting loss functions to each data subset, NLPrompt offers a powerful new approach for enhancing the performance and reliability of prompt-based learning models.
Cite this article: “Enhancing Robustness in Prompt-Based Learning with NLPrompt”, The Science Archive, 2025.
Prompt-Based Learning, Vision-Language Models, Optimal Transport Theory, Noisy Labels, Outliers, Loss Functions, Cross-Entropy, Mean Absolute Error, Robustness, Feature Learning







