NS-Gym: A Toolkit for Evaluating Decision-Making Algorithms in Dynamic Environments

Sunday 09 March 2025


In a recent study, researchers have developed a new toolkit for simulating complex decision-making processes in dynamic and changing environments. The toolkit, called NS-Gym, is designed to help scientists evaluate the performance of different algorithms in non-stationary Markov decision processes (MDPs), which are commonly used to model real-world problems such as autonomous driving, medical diagnosis, and financial portfolio optimization.


Non-stationary MDPs pose a significant challenge because they involve environments that change over time due to external factors. For example, in an autonomous driving scenario, the road conditions, weather, and traffic patterns can all change suddenly, requiring the vehicle’s decision-making system to adapt quickly. Similarly, in medical diagnosis, patient symptoms can evolve over time, making it necessary for doctors to adjust their treatment strategies accordingly.


To address this challenge, NS-Gym provides a standardized framework for simulating non-stationary MDPs and evaluating the performance of different algorithms in these environments. The toolkit includes six benchmark problems, each designed to test the ability of an algorithm to adapt to changing conditions. These problems range from navigating a cliff-walking maze to managing a portfolio of stocks and bonds.


The researchers used NS-Gym to evaluate the performance of six different decision-making algorithms in non-stationary MDPs. The algorithms included deep Q-networks, policy gradient methods, and model-based reinforcement learning techniques. The results showed that each algorithm performed well in certain environments but struggled in others, highlighting the importance of selecting the right algorithm for a particular problem.


One of the most interesting findings was that some algorithms were able to adapt better than others to changing environmental conditions. For example, an algorithm called PAMCTS (Policy-Aware Model-based Cross-Entropy Method) performed well in environments where the probability of moving in the intended direction changed over time. In contrast, another algorithm called DDQN (Deep Q-Networks with Dueling Network Architecture) struggled in these environments.


The researchers also found that providing agents with notifications about changes in the environment can significantly improve their performance. For example, in the CliffWalking problem, agents that received notifications about changes in the probability of moving in the intended direction were able to find the goal state more quickly than those that did not receive notifications.


Overall, NS-Gym provides a valuable tool for researchers and developers who need to evaluate the performance of decision-making algorithms in complex and dynamic environments.


Cite this article: “NS-Gym: A Toolkit for Evaluating Decision-Making Algorithms in Dynamic Environments”, The Science Archive, 2025.


Markov Decision Processes, Non-Stationary Mdps, Autonomous Driving, Medical Diagnosis, Financial Portfolio Optimization, Reinforcement Learning, Deep Q-Networks, Policy Gradient Methods, Model-Based Reinforcement Learning, Dynamic Environments.


Reference: Nathaniel S. Keplinger, Baiting Luo, Iliyas Bektas, Yunuo Zhang, Kyle Hollins Wray, Aron Laszka, Abhishek Dubey, Ayan Mukhopadhyay, “NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes” (2025).


Leave a Reply