# Formulaire

Pages: 11 (2646 mots) Publié le: 19 avril 2011
February 1, 2006, L. S, Institut de Statistique, UCL; updated May 6, 2010, by J. S

1

Formula list for multivariate statistics and data analysis
Eigenvalues and eigenvectors A : (p × p), c : (p × 1) and λ ∈ I Ac = λc R: • eigenvalues:
p j=1 p j=1 p j=1

λj =

a j j = trace(A) and

λ j = |A| for i = 1, . . . , p

• eigenvectors: ci c j = 0,

for any i

j and ci ci = 1,• Let C : (p × p), C = c1 , c2 , . . . , c p : we have C C = CC = I p C AC = Λ or A = CΛC , where Λ = diag(λ1 , . . . , λ p ) Basic Geometry Euclidean distance
p

d2 (x, y) =
j=1

(x j − y j )2 = (x − y) (x − y) = (x − y) I p (x − y)

Weighted Euclidean distance.

Let W = diag(w1 , . . . , w p ), w j > 0, j = 1, . . . , p :
p

d2 (x, y) = (x − y) W(x − y) =
j=1

w j (x j − y j)2

Generalized Euclidean distance with “Metric" Q > 0 d2 (x, y) = (x − y) Q(x − y) Norm of x: x = d2 (x, 0) = √ xx=
p i=1

xi2

Linear Transformations Random vectors • y = Ax + b ⇒ E(y) = AE(x) + b and Σy = AΣ x A • Let x ∼ N p (µ, Σ) and y = Ax + b where A : (q × p) and b : (q × 1): then y ∼ Nq (A µ + b, AΣ x A ) Data matrices. Let X : (n × p), A : (p × q) and Y = XA, then Y = (y1 , . . ., yq ). y=Ax ¯ ¯ S xy = S xx A S yx = A S xx Principal Components Deﬁnition of PCs • For j = 1, . . . , p, the jth principal component y j = a j x where S a j = λ j a j , with λ1 ≥ λ2 ≥ . . . ≥ λ p ≥ 0. We have Var(y j ) = λ j and Cov(y j , yk ) = 0. • We have S = AΛA where A = a1 , . . . , a p and Λ = diag(λ1 , . . . , λ p ). • The PCs are given by: Y = Xc A where (Xc )i j = xi j − x j . ¯ : (q ×1) : (q × q) : (p × q) : (q × p) S yy = A S xx A

February 1, 2006, L. S, Institut de Statistique, UCL; updated May 6, 2010, by J. S

2

Properties of PCs • Variances: S yy = A S A = Λ and Covariance with x:S xy = S A • Total variance: Var(y1 ) + . . . + Var(y p ) = • % variance: tk =
λk p k=1 λk p k=1

λk = trace(S ).
q k=1 p k=1

and cumulated %

Tq =

λk . λk

PCson Correlation Matrix • Standardization: Xcs = Xc D−1/2 , where D = diag(s11 , s22 , . . . , s pp ). Then (n − 1)−1 Xcs Xcs = R. • Spectral decomposition: Rak = λk ak , • Principal components: yk = ak x s , • Total variance:
p k=1

k = 1, . . . , p.

k = 1, . . . , p or Y = X s A and S yy = A RA = Λ

λk = trace(R) = p.

Correlations between yk and x j • Analysis with S : r x j ,yk = •Analysis with R: r xsj ,yk = • For any variable x j : Correspondence Analysis • Two-way contingency table: N : (I × J), N = ni j • Frequency table: F = fi j = • The χ2 -test statistic: Row and Column Proﬁles • Marginal proﬁles: fi· =
ni· n, ni j n I i=1 Cov(x j ,yk ) s2 j s2k x y

=

√ λk a jk √ sjj

λk a jk = r x j ,yk =
p k=1

p 2 k=1 r x j ,yk

λk a2 jk sjj

=

sjj sjj

= 1..
(ni j −Ei j )2 J j=1 Ei j

Q=

where

Ei j =

ni· n· j n .

i = 1, . . . , I and f· j =

n· j n ,

j = 1, . . . , J =
fi j fi· . ni j n· j

• Row proﬁles: ri = (ri1 , . . . , ri j , . . . , riJ ) where ri j =

ni j ni·

• Column proﬁles: c j = (c1 j , . . . , ci j , . . . , cI j ) where ci j =

=

fi j f· j

• Matrix notation: DI = diag( f1· , . . . , fi· , . . . ,fI· ) and D J = diag( f·1 , . . . , f· j , . . . , f·J ) • We have: R = D−1 F and C = FD−1 . I J The chi-square distances d2 (i1 , i2 ) = (ri1 − ri2 ) D−1 (ri1 − ri2 ) J Decomposition of the Chi-square statistics • We have: Q = n
I i=1 J j=1 ( fi j − f f ) g2j = n trace(GG ) = n trace(G G), where gi j = √ i· · j . i fi· f· j

and

d2 ( j1 , j2 ) = (c j1 − c j2 ) D−1 (c j1 − c j2 ). I

•PCA on the rows of R: ψk = R∗ vk = RD−1/2 vk = D−1 FD−1/2 vk I J J where vk are the eigenvectors of G G corresponding to the non-null eigenvalues λk . • PCA on the columns of C: φk = C ∗ uk = C D−1/2 uk = D−1 F D−1/2 uk , J I I where uk are the eigenvectors of G G corresponding to the non-null eigenvalues λk .

February 1, 2006, L. S, Institut de Statistique, UCL; updated May 6, 2010, by J....

Veuillez vous inscrire pour avoir accès au document.