Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering
Yoshua Bengio, Jean-Francois Paiement and Pascal Vincent ¸ D´ partement d’Informatique et Recherche Op´ rationnelle e e Universit´ de Montr´ al e e Montr´ al, Qu´ bec, Canada, H3C 3J7 e e {bengioy,paiemeje,vincentp}@iro.umontreal.ca Technical Report 1238, D´ partement d’Informatique et Recherche Op´ rationnelle e e July 25, 2003
Abstract Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides algorithms for such an extension for Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (all algorithms which provide lower-dimensional embedding for dimensionality reduction) as well as for Spectral Clustering (which performs non-Gaussian clustering). These extensions stem from a unified framework in which these algorithms are seen as learning eigenfunctions of a kernel. LLE and Isomap pose special challenges as the kernel is training-data dependent. Numerical experiments on real data show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms to the choice of training data.
1
Introduction
In the last few years, many unsupervised learning algorithms have been proposed which share the use of an eigendecomposition for obtaining a lower-dimensional embedding of the data that characterizes a non-linear manifold near which the data would lie: Local Linear Embedding (LLE) (Roweis and Saul, 2000), Isomap (Tenenbaum, de Silva and Langford, 2000) and Laplacian Eigenmaps (Belkin and Niyogi, 2003). There are also many variants of Spectral Clustering (Weiss, 1999; Ng, Jordan and Weiss, 2002), in which such an embedding is an intermediate step before obtaining a clustering of the data that can