Microsofts TAPAS System Aims to Make Large Language Models More Efficient and Sustainable

Sunday 02 March 2025


Researchers at Microsoft have developed a new system that aims to make large language models (LLMs) more efficient and sustainable for cloud-based data centers. The system, called TAPAS, uses thermal and power-aware scheduling to optimize LLM inference workloads in these centers.


One of the major challenges facing LLMs is their massive energy consumption. These models require powerful GPUs to process and generate text, which can result in significant heat generation and electricity usage. In fact, a single NVIDIA A100 GPU can consume up to 6.5 kilowatts of power, making it difficult for data centers to maintain optimal operating temperatures.


To address this issue, TAPAS uses a novel approach that considers both thermal and power constraints when scheduling LLM inference workloads. The system takes into account the unique characteristics of each workload, including its performance requirements, temperature sensitivity, and power consumption. By doing so, TAPAS can dynamically adjust the allocation of resources to ensure efficient and sustainable operation.


The researchers behind TAPAS have developed a sophisticated algorithm that continuously monitors the data center’s thermal and power conditions, as well as the performance of each LLM inference workload. This information is then used to make informed decisions about resource allocation, ensuring that workloads are distributed across available GPUs in a way that minimizes energy consumption and heat generation.


One key innovation of TAPAS is its ability to handle emergencies, such as cooling system failures or power outages. In these situations, the system can quickly reconfigure the allocation of resources to maintain optimal operating conditions and minimize downtime.


The implications of TAPAS are significant for the development of cloud-based LLMs. By reducing energy consumption and heat generation, data centers can reduce their environmental impact and operate more sustainably. Additionally, TAPAS can help improve the overall performance and reliability of LLM inference workloads, making them more suitable for a wide range of applications.


The researchers behind TAPAS are now working to integrate the system into existing cloud infrastructure, with plans to deploy it in Microsoft’s own data centers. As the demand for large language models continues to grow, innovations like TAPAS will play a crucial role in enabling their widespread adoption while minimizing their environmental footprint.


Cite this article: “Microsofts TAPAS System Aims to Make Large Language Models More Efficient and Sustainable”, The Science Archive, 2025.


Large Language Models, Thermal Awareness, Power-Aware Scheduling, Data Centers, Cloud Infrastructure, Energy Consumption, Heat Generation, Nvidia A100 Gpu, Microsoft, Tapas.


Reference: Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, Ricardo Bianchini, “TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms” (2025).


Leave a Reply