Tuesday 29 July 2025
Researchers have long known that large language models (LLMs) are capable of generating impressive responses, but a new study suggests that their ability to do so may be limited by the quality of the information they’re trained on.
The team behind the research used a framework called CRUX to evaluate how well LLMs can generate text based on a set of retrieved passages. In other words, they wanted to see how well these models could use external knowledge sources to inform their responses.
To test this, the researchers created a series of prompts and evaluated how well LLMs could answer them using different sets of retrieved passages. They found that when the passages were high-quality and relevant, the LLMs were able to generate accurate and informative responses. However, when the passages were low-quality or irrelevant, the LLMs struggled to produce anything meaningful.
This isn’t particularly surprising – after all, language models are only as good as the information they’re trained on. But what’s interesting is that the researchers also found that even when the passages were high-quality, there was still a significant amount of variation in how well the LLMs performed.
Some models were able to generate accurate responses despite being given low-quality passages, while others struggled even with high-quality input. This suggests that there may be more to LLM performance than just the quality of their training data – factors like model architecture and training methods may also play a significant role.
The researchers also experimented with different techniques for retrieving and filtering passages, and found that these can have a significant impact on LLM performance. For example, using a more targeted approach to retrieve relevant passages can improve accuracy, while relying too heavily on general-purpose search engines may lead to lower-quality results.
Overall, the study highlights the importance of carefully evaluating the quality and relevance of external knowledge sources when training language models. By doing so, researchers and developers may be able to create more accurate and informative responses – and ultimately, more useful AI systems.
The team’s findings also have implications for the wider field of natural language processing. As LLMs become increasingly important in areas like customer service, content generation, and language translation, it’s crucial that we understand their strengths and limitations. By doing so, we can develop more effective strategies for training and deploying these models – and ultimately, create better AI systems that can help us achieve our goals.
Cite this article: “Limitations of Large Language Models Revealed in Study on External Knowledge Sources”, The Science Archive, 2025.
Large Language Models, Quality Of Training Data, Crux Framework, External Knowledge Sources, Passage Retrieval, Filtering Techniques, Model Architecture, Training Methods, Natural Language Processing, Ai Systems.