méthode de classification non supervisées
DE CLASSIFICATION NON SUPERVISEE
Livres :
● Approche pragmatique de la classification: arbres hiérarchiques, partitionnements. J-P.
Nakache, J. Confais. Technip 2004 (site internet)
● Finding Groups in Data: An Introduction to Cluster Analysis. Kaufman, L. and Rousseeuw,
P.J. (1990). Wiley, New York.
Sites :
●
●
●
http://zoonek2.free.fr/UNIX/48_R_2004/all.html http://www.aliquote.org/articles/tech/multvar/multvar.html http://www.jstatsoft.org/v01/i04 (article en ligne)
Calcul du tableau de dissimilarité
Les différentes méthodes ci-dessous utilisent un tableau de dissimilarité à l'exception de MONA.
La construction de ces tableaux peut être réalisée dans R avec :
●
dist(x, method = "euclidean", diag = FALSE, upper=FALSE,p=2)
Available distance measures are (written for two vectors x and y):
euclidean: Usual square distance between the two vectors (2 norm).
maximum: Maximum distance between two components of x and y (supremum norm)
manhattan: Absolute distance between the two vectors (1 norm).
canberra: sum(|x_i - y_i| / |x_i + y_i|). Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.
binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are ‘on’ and zero elements are ‘off’. The distance is the proportion of bits in which only one is on amongst those in which at least one is on.
minkowski: The p norm, the pth root of the sum of the pth powers of the differences of the components.
●
dist.dudi(dudi, amongrow = TRUE)
●
dudi représente un schéma de dualité (acp, afc ...) amongrow=T indique la dimension sur laquel le calcul doit être réalisé dist.dudi(dudi.cpa(cidre),amongrow=TRUE) réalise le calcul de la matrice des distances entre les individus dist.dudi(dudi.coa(csp),amongrow=F) réalise le calcul de la matrice des distances entre sur les profils colonnes.
daisy(x, metric =