Sunday 23 February 2025
As data-intensive machine learning workloads continue to grow, researchers are facing a significant challenge: how to optimize performance on GPU-SSD systems. Traditional approaches have reached their limits, and new strategies are needed to overcome bottlenecks caused by CPU-mediated data access patterns.
One promising solution is the MQMS simulator, which maintains awareness of internal SSD states and operations. By introducing dynamic address allocation and fine-grained address mapping, MQMS enables intelligent scheduling and address allocation to maximize internal parallelism and efficiently handle small I/O requests.
The simulator’s performance characteristics were evaluated using representative traces from large-scale machine learning workloads, including BERT, GPT-2, and ResNet-50. The results showed that MQMS achieved orders-of-magnitude improvements in I/O request throughput, device response time, and simulation end time compared to existing simulators.
The key to MQMS’s success lies in its ability to exploit plane-level parallelism, which is particularly effective for workloads with frequent small requests. By distributing these requests across multiple planes without the overhead of read-modify-write operations, MQMS reduces latency and improves overall system performance.
Policy maxima, or optimal configurations, were also identified through analysis of policy combinations. For example, large chunk scheduling and WCDP page allocation proved to be a winning combination for backpropagation workloads, while round-robin scheduling and CDWP allocation performed better for hotspot workloads.
The implications of MQMS are significant, as it has the potential to accelerate system-level execution times by up to 21% in certain scenarios. This could have far-reaching consequences for industries that rely heavily on machine learning, such as healthcare, finance, and robotics.
While there is still much work to be done to fully realize the benefits of MQMS, this simulator represents a significant step forward in the quest to optimize performance on GPU-SSD systems. By leveraging the potential of plane-level parallelism and intelligent scheduling, researchers can unlock new levels of efficiency and productivity in data-intensive machine learning applications.
Cite this article: “Optimizing Performance on GPU-SSD Systems with MQMS Simulator”, The Science Archive, 2025.
Gpu-Ssd, Machine Learning, Performance Optimization, Simulation, Ssd States, Address Allocation, Parallelism, I/O Requests, Policy Maximization, Data-Intensive Workloads







