Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering

Disponible uniquement sur Etudier
  • Pages : 17 (4239 mots )
  • Téléchargement(s) : 0
  • Publié le : 7 décembre 2010
Lire le document complet
Aperçu du document
Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
Yoshua Bengio, Jean-Francois Paiement and Pascal Vincent ¸ D´ partement d’Informatique et Recherche Op´ rationnelle e e Universit´ de Montr´ al e e Montr´ al, Qu´ bec, Canada, H3C 3J7 e e {bengioy,paiemeje,vincentp}@iro.umontreal.ca Technical Report 1238, D´ partement d’Informatique et Recherche Op´ rationnelle e eJuly 25, 2003

Abstract Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides algorithms for such an extension for Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps,Multi-Dimensional Scaling (all algorithms which provide lower-dimensional embedding for dimensionality reduction) as well as for Spectral Clustering (which performs non-Gaussian clustering). These extensions stem from a unified framework in which these algorithms are seen as learning eigenfunctions of a kernel. LLE and Isomap pose special challenges as the kernel is training-data dependent. Numericalexperiments on real data show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms to the choice of training data.



In the last few years, many unsupervised learning algorithms have been proposed which share the use of an eigendecomposition for obtaining a lower-dimensional embedding of the data that characterizes anon-linear manifold near which the data would lie: Local Linear Embedding (LLE) (Roweis and Saul, 2000), Isomap (Tenenbaum, de Silva and Langford, 2000) and Laplacian Eigenmaps (Belkin and Niyogi, 2003). There are also many variants of Spectral Clustering (Weiss, 1999; Ng, Jordan and Weiss, 2002), in which such an embedding is an intermediate step before obtaining a clustering of the data that cancapture flat, elongated and even curved clusters. The two tasks (manifold 1

learning and clustering) are linked because the clusters that spectral clustering manages to capture can be arbitrary curved manifolds (as long as there is enough data to locally capture the curvature of the manifold).


Common Framework

In this paper we consider five types of unsupervised learning algorithms that canbe cast in the same framework, based on the computation of an embedding for the training points obtained from the principal eigenvectors of a symmetric matrix. Algorithm 1 1. Start from a data set D = {x1 , . . . , xn } with n points in some space. Construct a n × n “neighborhood” or similarity matrix M . Let us denote KD (., .) (or K for shorthand) the two-argument function (sometimes dependenton D) which produces M by Mij = KD (xi , xj ). ˜ 2. Optionally transform M , yielding a “normalized” matrix M . Equivalently, this ˜ corresponds to applying a symmetric two-argument function KD to each pair of exam˜ ples (xi , xj ) to obtain Mij . ˜ 3. Compute the m largest eigenvalues λj and eigenvectors vj of M . Only positive eigenvalues should be considered. 4. The embedding of each example xiis the vector yi with yij the i-th element of ˜ the j-th principal eigenvector vj of M . Alternatively (MDS and Isomap), the embedding is ei , with eij = λj yij . If the first m eigenvalues are positive, then ei .ej is the ˜ using only m coordinates, in the squared error sense. best approximation of M In the following, we consider the specializations of Algorithm 1 for different unsupervisedlearning algorithms. Let Si be the i-th row sum of the affinity matrix M : Si =

Mij .


We say that two points (a, b) are k-nearest-neighbors of each other if a is among the k nearest neighbors of b in D ∪ {a} or vice-versa.


Multi-Dimensional Scaling

Multi-Dimensional Scaling (MDS) starts from a notion of distance or affinity K that is computed between each pair of training...
tracking img