Formulaire
1
Formula list for multivariate statistics and data analysis
Eigenvalues and eigenvectors A : (p × p), c : (p × 1) and λ ∈ I Ac = λc R: • eigenvalues: p j=1 p j=1 p j=1
λj =
a j j = trace(A) and
λ j = |A| for i = 1, . . . , p
• eigenvectors: ci c j = 0,
for any i
j and ci ci = 1,
• Let C : (p × p), C = c1 , c2 , . . . , c p : we have C C = CC = I p C AC = Λ or A = CΛC , where Λ = diag(λ1 , . . . , λ p ) Basic Geometry Euclidean distance p d2 (x, y) = j=1 (x j − y j )2 = (x − y) (x − y) = (x − y) I p (x − y)
Weighted Euclidean distance.
Let W = diag(w1 , . . . , w p ), w j > 0, j = 1, . . . , p : p d2 (x, y) = (x − y) W(x − y) = j=1 w j (x j − y j )2
Generalized Euclidean distance with “Metric" Q > 0 d2 (x, y) = (x − y) Q(x − y) Norm of x: x = d2 (x, 0) = √ xx= p i=1
xi2
Linear Transformations Random vectors • y = Ax + b ⇒ E(y) = AE(x) + b and Σy = AΣ x A • Let x ∼ N p (µ, Σ) and y = Ax + b where A : (q × p) and b : (q × 1): then y ∼ Nq (A µ + b, AΣ x A ) Data matrices. Let X : (n × p), A : (p × q) and Y = XA, then Y = (y1 , . . . , yq ). y=Ax ¯ ¯ S xy = S xx A S yx = A S xx Principal Components Definition of PCs • For j = 1, . . . , p, the jth principal component y j = a j x where S a j = λ j a j , with λ1 ≥ λ2 ≥ . . . ≥ λ p ≥ 0. We have Var(y j ) = λ j and Cov(y j , yk ) = 0. • We have S = AΛA where A = a1 , . . . , a p and Λ = diag(λ1 , . . . , λ p ). • The PCs are given by: Y = Xc A where (Xc )i j = xi j − x j . ¯ : (q × 1) : (q × q) : (p × q) : (q × p) S yy = A S xx A
February 1, 2006, L. S, Institut de Statistique, UCL; updated May 6, 2010, by J. S
2
Properties of PCs • Variances: S yy = A S A = Λ and Covariance with x:S xy = S A • Total variance: Var(y1 ) + . . . + Var(y p ) = • % variance: tk = λk p k=1 λk p k=1
λk = trace(S ). q k=1 p k=1
and cumulated %
Tq =
λk . λk
PCs