Saturday 29 March 2025
Researchers at Hubei Key Laboratory of Transportation Internet of Things have designed a novel approach to evaluating the numerical reasoning capabilities of large language models (LLMs). In an effort to bridge the gap between LLMs’ impressive natural language processing abilities and their struggles with mathematical problems, the team created the Agent Trading Arena, a virtual environment where LLMs can engage in simulated stock trading.
The arena is designed to mimic real-world economic systems, with agents making decisions based on visual and textual inputs. The system tracks each agent’s performance, updating its strategy based on feedback from the environment. This feedback loop enables the agent to refine its approach over time, adapting to changing market conditions.
To assess the LLMs’ abilities, the researchers selected seven stocks from the NASDAQ exchange and created a dataset spanning July 3, 2023, to October 29, 2024. The training period for MACD (Moving Average Convergence Divergence), StockFormer, and TimesNet was longer, with these models requiring broader time ranges for their training datasets.
The testing period for all models began on September 3, 2024, and ended on October 29, 2024. This timeframe allowed the researchers to evaluate the LLMs’ performance in a fair and unbiased manner, without any potential data leaks influencing their results.
One of the key findings was that LLMs, including GPT-4o, struggled with algebraic reasoning when presented with plain-text stock data. However, when provided with visual representations such as scatter plots or K-line charts, the models demonstrated stronger numerical reasoning abilities.
The researchers attribute this improvement to the visual representations enhancing numerical understanding. The incorporation of a reflection module, which aids in data analysis and interpretation, further enhanced the LLMs’ performance.
The study’s results suggest that LLMs are capable of learning from experience and adapting their strategies over time. While they may not excel at mathematical problems without additional training, they can still demonstrate impressive numerical reasoning capabilities when presented with visual inputs.
The Agent Trading Arena offers a unique platform for evaluating the abilities of LLMs in complex, dynamic environments. As researchers continue to refine this system, it will provide valuable insights into the strengths and limitations of these powerful language models.
In the future, the arena could be used to explore other applications, such as training agents for more complex tasks or simulating real-world scenarios like financial trading.
Cite this article: “Evaluating Large Language Models Numerical Reasoning Abilities in Simulated Stock Trading Environments”, The Science Archive, 2025.
Large Language Models, Numerical Reasoning, Stock Trading, Agent Trading Arena, Virtual Environment, Natural Language Processing, Mathematical Problems, Visual Representations, Scatter Plots, K-Line Charts