Correcting Speech Recognition Errors with Large Language Models

Sunday 02 February 2025


The quest for perfect speech recognition has been ongoing for decades, with researchers and developers working tirelessly to improve the accuracy of automated systems. One major hurdle in achieving this goal is the problem of errors in automatic speech recognition (ASR) output, which can be caused by a variety of factors such as background noise, accents, or misheard words.


To tackle this issue, a team of researchers has developed a novel approach that leverages large language models (LLMs) to correct ASR errors. The concept is simple: train an LLM on a dataset of corrected speech recognition outputs and then use it to correct errors in real-time.


The researchers started by creating a benchmark dataset specifically designed for Chinese ASR error correction, which they call ASR-EC. This dataset consists of thousands of audio clips, each with its corresponding transcribed text, as well as manually corrected versions of the transcripts. By training an LLM on this dataset, the team aimed to develop a model that could learn to identify and correct errors in ASR output.


The results are impressive: the LLM-based approach significantly outperforms traditional ASR systems in correcting errors, achieving a correction rate of over 90%. Furthermore, the researchers found that fine-tuning the LLM with their ASR-EC dataset resulted in even better performance, demonstrating the effectiveness of their approach.


But how does it work? In essence, the LLM is trained to recognize patterns in the corrected transcripts and use this knowledge to correct errors in real-time. The model is designed to be highly adaptable, allowing it to learn from a wide range of audio clips and adapt to different accents and speaking styles.


The implications of this research are significant: with an accurate ASR system capable of correcting its own errors, the potential applications are vast. Imagine being able to interact with your smart home devices using natural language commands without worrying about misheard words or background noise. Or picture a future where doctors can quickly review medical records and correct any errors with ease.


Of course, there are still challenges ahead. For instance, the ASR-EC dataset is currently limited to Chinese, and extending it to other languages will require significant additional effort. Furthermore, the LLM-based approach may not be suitable for all applications, particularly those that require extremely high accuracy or real-time processing.


Despite these limitations, the research has opened up new avenues of exploration in the field of ASR error correction.


Cite this article: “Correcting Speech Recognition Errors with Large Language Models”, The Science Archive, 2025.


Automatic Speech Recognition, Error Correction, Large Language Models, Chinese Asr, Benchmark Dataset, Transcribed Text, Manually Corrected Transcripts, Pattern Recognition, Real-Time Processing, Smart Home Devices


Reference: Victor Junqiu Wei, Weicheng Wang, Di Jiang, Yuanfeng Song, Lu Wang, “ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction” (2024).


Leave a Reply