Thursday 23 January 2025
The hype surrounding large language models (LLMs) has led many to believe that they can solve complex problems with ease, including optimizing heuristics for combinatorial optimization. However, a recent study suggests that this may not be entirely accurate.
Researchers from various institutions have been experimenting with LLMs to automatically design and evolve new heuristics for bin packing, a classic problem in computer science. Bin packing involves dividing items of varying sizes into the smallest possible number of bins, while minimizing waste and maximizing efficiency. The study aimed to test whether these LLM-generated heuristics could outperform traditional hand-designed algorithms.
The researchers compiled a comprehensive benchmark suite consisting of 12 datasets with varying characteristics. They then trained five LLM-based heuristics on each dataset and compared their performance against six traditional heuristics, including one hand-designed algorithm.
The results were surprising. While the LLM-generated heuristics showed impressive performance on specific datasets, they failed to generalize well across different distributions of item sizes. In fact, two of the LLM heuristics performed significantly worse than the best traditional heuristic in terms of average excess bins (AEB), a metric that measures the difference between the actual number of bins used and the optimal number.
The study also found that the LLM-generated heuristics were highly specialized, meaning they performed well on specific datasets but poorly on others. This is in contrast to traditional heuristics, which tend to be more generalizable across different problem instances.
Furthermore, the researchers discovered that some of the LLM-generated heuristics had exceptional performance on their training data but failed to adapt when presented with new, unseen instances. This suggests that the models may have overfitted to the specific dataset used for training.
The findings imply that while LLMs can be useful tools for automating heuristic design, they are not a silver bullet solution for complex optimization problems. Instead, researchers should consider combining traditional heuristics with machine learning techniques or exploring alternative approaches that can better generalize across different problem instances.
Ultimately, the study highlights the importance of rigorous benchmarking and evaluation when using LLMs to solve real-world problems. By understanding the strengths and limitations of these models, researchers can develop more effective strategies for solving complex optimization challenges.
Cite this article: “LLMs Fall Short in Solving Complex Optimization Problems”, The Science Archive, 2025.
Large Language Models, Combinatorial Optimization, Bin Packing, Heuristic Design, Machine Learning, Traditional Heuristics, Generalizability, Overfitting, Benchmarking, Optimization Problems







