Friday 07 March 2025
The quest for efficient and effective unsupervised learning algorithms has been ongoing in the field of machine learning for years. A recent paper proposes a novel approach to clustering, dubbed Autoencoded UMAP-Enhanced Clustering (AUEC), which combines the strengths of two powerful dimensionality reduction techniques: uniform manifold approximation and projection (UMAP) and autoencoders.
The authors of this study present AUEC as a three-stage framework that leverages the benefits of both UMAP and autoencoders to improve clustering performance. The first stage involves using UMAP to embed the high-dimensional data into a lower-dimensional space, where clusters are more easily separable. This is achieved by minimizing the spectral gap between the graph Laplacian’s eigenvalues, which provides a measure of clusterability.
The second stage consists of training an autoencoder on the embedded data. The autoencoder learns a non-linear mapping from the input data to a lower-dimensional representation, while also promoting clustering by incorporating a clustering-promoting loss function into its objective. This loss function is designed to maximize the relative spectral gap between clusters, thus enhancing clusterability.
The third stage involves selecting the number of clusters and performing k-means clustering on the embedded data. The authors propose using a modified version of DBSCAN (density-based spatial clustering of applications with noise) to select the optimal number of clusters and merge smaller clusters or outliers into larger ones based on proximity.
Experiments conducted on the popular MNIST dataset demonstrate the effectiveness of AUEC in identifying meaningful clusters compared to state-of-the-art methods. The authors also provide a comparative analysis of their approach against other unsupervised learning algorithms, highlighting its advantages in terms of clustering accuracy and NMI (normalized mutual information).
The use of UMAP in AUEC enables it to effectively capture the underlying manifold structure of the data, even when it is complex or contains noise. The autoencoder component helps to preserve local structures within each cluster, leading to more accurate clustering results.
One of the key advantages of AUEC is its ability to adapt to different data distributions and complexities. By incorporating a clustering-promoting loss function into the autoencoder’s objective, the algorithm can learn to focus on relevant features that distinguish between clusters, even in the presence of noise or outliers.
While AUEC shows promise as an effective unsupervised learning algorithm, there are still limitations to its application.
Cite this article: “Autoencoded UMAP-Enhanced Clustering: A Novel Approach to Unsupervised Learning”, The Science Archive, 2025.
Clustering, Unsupervised Learning, Dimensionality Reduction, Autoencoders, Umap, Manifold Approximation, Projection, Clustering Promoting Loss Function, K-Means Clustering, Dbscan







