Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering

4239 mots 17 pages
Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
Yoshua Bengio, Jean-Francois Paiement and Pascal Vincent ¸ D´ partement d’Informatique et Recherche Op´ rationnelle e e Universit´ de Montr´ al e e Montr´ al, Qu´ bec, Canada, H3C 3J7 e e {bengioy,paiemeje,vincentp}@iro.umontreal.ca Technical Report 1238, D´ partement d’Informatique et Recherche Op´ rationnelle e e July 25, 2003

Abstract Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides algorithms for such an extension for Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (all algorithms which provide lower-dimensional embedding for dimensionality reduction) as well as for Spectral Clustering (which performs non-Gaussian clustering). These extensions stem from a unified framework in which these algorithms are seen as learning eigenfunctions of a kernel. LLE and Isomap pose special challenges as the kernel is training-data dependent. Numerical experiments on real data show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms to the choice of training data.

1

Introduction

In the last few years, many unsupervised learning algorithms have been proposed which share the use of an eigendecomposition for obtaining a lower-dimensional embedding of the data that characterizes a non-linear manifold near which the data would lie: Local Linear Embedding (LLE) (Roweis and Saul, 2000), Isomap (Tenenbaum, de Silva and Langford, 2000) and Laplacian Eigenmaps (Belkin and Niyogi, 2003). There are also many variants of Spectral Clustering (Weiss, 1999; Ng, Jordan and Weiss, 2002), in which such an embedding is an intermediate step before obtaining a clustering of the data that can

en relation

  • Feuille de note exam mqt-1500
    3713 mots | 15 pages
  • Corrigé bts muc
    2013 mots | 9 pages
  • Diaporama ifsi
    1902 mots | 8 pages
  • Exercices corrigés exo
    876 mots | 4 pages
  • Corrigé exo3
    1134 mots | 5 pages
  • maths
    551 mots | 3 pages
  • DATA MINI MININGNG
    2144 mots | 9 pages
  • Hello
    678 mots | 3 pages
  • 1s_6 la_couleur_des_etoiles
    2296 mots | 10 pages
  • site archéologique
    418 mots | 2 pages
  • Dissertation
    3351 mots | 14 pages
  • Methode pduc
    606 mots | 3 pages
  • Cned
    645 mots | 3 pages
  • Equateur
    1038 mots | 5 pages
  • education patient
    1885 mots | 8 pages