Standardizing Evaluation of Optical Music Recognition Systems with Sheet Music Benchmark and Normalized Edit Distance Metric

Thursday 17 July 2025

The quest for a standardized way to evaluate the performance of optical music recognition (OMR) systems has been ongoing for decades. These systems are designed to automatically extract musical information from scanned images, manuscripts, or printed documents and convert it into a structured digital format.

Until now, researchers have relied on a patchwork of metrics and benchmarks to assess the success of their OMR approaches. However, this has led to inconsistencies and difficulties in comparing the results across different studies. The lack of a unified evaluation framework has hindered progress in the field and made it challenging for developers to gauge the effectiveness of their systems.

A team of researchers from the University of Alicante in Spain has taken a significant step towards addressing this issue by introducing the Sheet Music Benchmark (SMB) dataset and the OMR Normalized Edit Distance (OMR- NED) metric. SMB is a comprehensive collection of 685 pages of sheet music, featuring a diverse range of musical textures, including monophony, piano form, quartet, and others.

The researchers used the SMB dataset to train and assess state-of-the-art OMR methods, demonstrating the benchmark’s complexity and effectiveness in evaluating systems targeting Common Western Modern Notation. The results show that the benchmark is capable of detecting even subtle errors in musical notation, making it an invaluable tool for developers seeking to improve their OMR systems.

The OMR-NED metric is a novel evaluation framework designed specifically for OMR research. It decomposes traditional edit distance calculations into distinct error categories, providing a fine-grained analysis of system performance. This allows researchers to identify and address specific weaknesses in their approaches, ultimately leading to more accurate and reliable OMR systems.

One of the key advantages of the SMB dataset is its ability to accommodate different levels of evaluation, from layout analysis to full-page transcription tasks. This versatility makes it an attractive option for researchers seeking to evaluate their OMR systems at multiple stages of the music recognition process.

The introduction of SMB and OMR-NED has significant implications for the field of OMR research. It provides a standardized framework for evaluating system performance, enabling developers to compare results across different studies and identify areas for improvement. Moreover, it offers a valuable resource for researchers seeking to develop more accurate and reliable OMR systems.

As OMR continues to play an increasingly important role in music information retrieval and computational musicology, the need for standardized evaluation metrics and benchmarks will only continue to grow.

Cite this article: “Standardizing Evaluation of Optical Music Recognition Systems with Sheet Music Benchmark and Normalized Edit Distance Metric”, The Science Archive, 2025.

Optical Music Recognition, Sheet Music Benchmark, Omr Normalized Edit Distance, Music Information Retrieval, Computational Musicology, Automated Music Transcription, Music Notation, Evaluation Metrics, Benchmarking, Standardization.

Reference: Juan C. Martinez-Sevilla, Joan Cerveto-Serrano, Noelia Luna, Greg Chapman, Craig Sapp, David Rizo, Jorge Calvo-Zaragoza, “Sheet Music Benchmark: Standardized Optical Music Recognition Evaluation” (2025).

Discussion