Automating Efficient GPU Kernels with Language Models

Tuesday 25 March 2025


For decades, computer scientists have been trying to crack the code of automating the writing of efficient GPU kernels. These tiny bits of code are responsible for making machine learning models run quickly and accurately on graphics processing units (GPUs). However, writing them by hand is a time-consuming challenge that requires significant expertise.


Recently, researchers have turned to language models (LMs) to see if they can automate this process. In a new paper, scientists at Stanford University and Princeton University have made a significant breakthrough in using LMs to write fast and correct GPU kernels.


The team developed a framework called KernelBench, which evaluates the ability of LMs to generate efficient kernels on a suite of machine learning workloads. They used a range of state-of-the-art language models and test-time methods to see how well they performed.


The results were impressive. The best-performing model was able to write kernels that matched the performance of hand-written code in over 70% of cases. What’s more, the LMs were able to generate kernels that worked correctly on a range of different hardware platforms, including NVIDIA GPUs with different micro-architectures and memory configurations.


The researchers also explored how providing LMs with hardware-specific information, such as the type of GPU being used, could improve their performance. They found that this approach led to significant gains in speed and accuracy.


One of the most interesting aspects of this research is its potential impact on the field of machine learning. By automating the writing of efficient GPU kernels, LMs could help reduce the time and effort required to develop new machine learning models. This could lead to faster development times and more accurate results.


The KernelBench framework provides a powerful tool for evaluating the performance of LMs in generating GPU kernels. The researchers hope that their work will inspire further research into this area and help to accelerate progress in machine learning.


To visualize the generated kernels, the team developed an interface that allows users to easily examine kernel content and compare across various techniques and configurations. This provides a valuable tool for understanding how different LMs perform on different tasks.


Overall, this research represents a significant step forward in the development of automated GPU kernel generation. The potential benefits are substantial, and it will be exciting to see where this technology takes us in the future.


Cite this article: “Automating Efficient GPU Kernels with Language Models”, The Science Archive, 2025.


Gpu Kernels, Machine Learning, Language Models, Automating Code Generation, Kernel Optimization, Nvidia Gpus, Micro-Architectures, Memory Configurations, Hardware-Specific Information, Neural Networks


Reference: Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini, “KernelBench: Can LLMs Write Efficient GPU Kernels?” (2025).


Leave a Reply