Saturday 15 March 2025
Researchers have long sought to understand how audio embeddings, the mathematical representations of sound waves, respond to various effects that can alter their characteristics. These effects, such as gain adjustments or low-pass filtering, are common in music production and can significantly impact the way an audio signal is perceived.
A recent study published in a prominent journal has shed new light on this topic by investigating the sensitivity of pre-trained audio embeddings to four common audio effects: gain, low-pass filtering, reverberation, and bitcrushing. The researchers used three popular audio embedding models – OpenL3, PANNs, and CLAP – and applied these effects to a dataset of music excerpts from 11 different instrument classes.
The results showed that the embeddings are highly sensitive to the strength of each effect, with some even exhibiting discontinuous responses. However, when analyzing the deformation trajectories carved out by the parameter sweeps, the researchers found that there is no single direction or low-dimensional subspace in the embedding space that captures the deformation induced by these effects.
This finding has significant implications for the use of audio embeddings in downstream tasks such as music classification and event detection. It suggests that simply projecting out a single deformation direction or subspace may not be sufficient to improve the robustness of the embeddings to these effects.
The study also explored various methods for reducing the sensitivity of the embeddings, including global CCA projection, LDA projection, average displacement projection, and sample-wise CCA SVD. While some of these methods showed promise in improving classification performance, others had little or no impact.
One interesting observation was that the normalized PCA projection variant had a neutral effect on classification performance for PANNs and CLAP when used with all four audio effects. This suggests that the embeddings may be more robust to certain types of deformation than previously thought.
The study’s findings have important implications for the development of new audio embedding models and their applications in music information retrieval tasks. By better understanding how these embeddings respond to various audio effects, researchers can design more effective methods for reducing their sensitivity and improving overall performance.
The use of pre-trained audio embeddings has become increasingly popular in recent years due to their ability to extract meaningful representations from complex audio signals. However, the limitations of these models must be acknowledged and addressed in order to unlock their full potential.
Ultimately, this study highlights the need for a deeper understanding of how audio embeddings respond to various effects and deformation trajectories.
Cite this article: “Sensitivity of Pre-Trained Audio Embeddings to Audio Effects”, The Science Archive, 2025.
Audio Embeddings, Music Information Retrieval, Pre-Trained Models, Gain Adjustment, Low-Pass Filtering, Reverberation, Bitcrushing, Sensitivity Analysis, Embedding Deformation, Classification Performance







