It's clear that you don't need 2 dimensions (height and age) to be a representative of a particular tree. If you take an another representative $ha=\frac1{\sqrt{2}}(height+age)$, you then need only 1D to represent trees.
Or you can understand that you want to find the most important characteristics of the trees in this statistic. It may be $ha=\frac1{\sqrt{2}}(height+age)$
Or in mathematic, just answer the simple question: Whether we can reduce the number of dimension of the statistic to keep the most accurate properties of the statistic.
In practical, we have much more dimensions than the above example, in that case, the dimensional reduction is very necessary.
In essential, they are the CHANGE OF BASIS
The old (observed) basis is:
$$B=\begin{bmatrix}b_1\\b_2\\...\\b_m\end{bmatrix}=\begin{bmatrix}1&0&...&0\\0&1&...&0\\...\\0&0&...&1\end{bmatrix}=I$$
We would like to change the basis to another basis by a linear combination:
$$B_P=PB=P$$
The rows of P are a set of new basic vectors.
MATRIX P IS CALLED AS THE PRINCIPAL COMPONENTS OF X
The data is re-expressed as
$$Y=PX$$
That lead questions:
- What is the best way to re-express X?
- What is a good choice of basis P
It's obvious that A and B variable is uncorrelated if and only if AB=0. In practice, A and B is expressed as a set of observed data. Therefore A and B are uncorrelated if and only if $A^TB=0$
ANSWER THE QUESTIONS:
The BEST WAY to re-express X is that
$$CY=\begin{bmatrix}y_{11}&0&...&0\\0&y_{22}&...&0\\...\\0&0&...&y_{mm}\end{bmatrix}$$
It means that the re-expressed data is uncorrelated with the data along the new basic vectors
$$C_Y=\frac1mYY^T=\frac1mPXX^TP^T=PC_XP^T$$
$C_X=\frac1mXX^T$ is a symmetric matrix. It can be analyzed as $C_X=EDE^T$ where D is diagonal matrix and $\begin{bmatrix}e_1&e_2&...&e_m\end{bmatrix}$ where the column $e_i$ is eigenvector $e_i$.
$C_X$ is now become $C_Y=(PE)D(PE)^T$ We would like to find the matrix P according to the diagonal matrix C_Y. If we choose $P=E^T$, $C_Y$ will become $C_Y=D$ because E is eigenvector ($EE^T=E^TE=I$).
HOW TO SOLVE PCA
1. Eigenvector decomposition
- The principle components of X are the eigenvectors of $C_X=\frac1nXX^T$
- The $i^{th}$ diagonal value of $C_Y$ is the variance of X along $p_i$
2. Using Single Value Decomposition SVD
If X is an arbitrary nxm matrix and rank of $X^TX$ is r, I will prove that the principle components of X will be the set of othorgonal eigenvector ${v_1,v_2, ..., v_r}$
Prove
Because $v_i$ is the eigenvector of X associated eigenvalue $\lambda_i$
$$(X^TX)v_i=\lambda_iv_i$$
If we denote $u_i=\frac1\sigma_iXv_i$ where $\sigma_i=\sqrt{\lambda_i}$
It derive to the column $i^{th}$: $Xv_i=\sigma_iu_i$, and in matrix view scope:
$$XV=U\Sigma$$
Where:
$$\Sigma=\begin{bmatrix}\sigma_1&0&...&0&...&0\\0&\sigma_2&...&0&...&0\\...\\0&0&...&\sigma_r&...&0\\0&0&...&0&...0\end{bmatrix}$$
$$V=\begin{bmatrix}v_1&v_2&...&v_m\end{bmatrix}$$
$$U=\begin{bmatrix}u_1&u_2&...&u_n\end{bmatrix}$$
The matrix V and U is added with (m-r) and (n-r) orthonormal vectors respectively. The properties of U and V is:
- The columns of V are orthonormal together (definition) and U are also orthonormal ($u_i^Tu_j=0$
- The $||v_i||=1$ (definition) and $||u_i||=1$
In conclusion, an arbitrary matrix n x m can be analyses as:
$$X=U{\Sigma}V^T$$
It's obvious that
$$X^TX=V{\Sigma}^TU^TU{\Sigma}V^T=VDV^T$$
Therefore, we can compute SVD of $X^TX$ to find the PC of X
Source code to test PCA on Matlab
The input data (X)
Output data (after projection) (Eigen Decomposition method)
Output data (SVD method)
%Create the samples
X=[1:1000;1:1000]; % X is 2 dimensional with 1000 samples
X=X+100*randn(2,1000); % Create some 'noise' around the true value
plot(X(1,:),X(2,:),'or'); %Visualize the samples
grid on;
axis([-2000 2000 -2000 2000])
%OBJECTIVE:
%FINDING THE PC OF X
%PROJECT X ONTO A NEW BASIC (FINDING Y) WHERE IT IS 'BETTER'
%1st method (EIGEN VECTOR DECOMPOSITION)
N=size(X,2); %Determine the number of samples
covX=1/(N-1)*X*X';
[PC V]=eig(covX);
%Extract diagnonal of matrix as a vector
%Sort the variances in decreasing order
diagV=diag(V);
[tmp,srt]=sort(-1*diagV);
V=diagV(srt);
PC=PC(:,srt);
Y=PC'*X;
figure;
plot(Y(1,:),Y(2,:),'o'); grid on;
axis([-2000 2000 -2000 2000])
%2nd method (SINGLE VALUE DECOMPOSITION)
mn=mean(X,2);
X=X-repmat(mn,1,N); %Remove the mean for each dimension
[u,S,PC]=svd(X');
S=diag(S);
V2=S.*S;
Y2=PC'*X;
figure;
plot(Y2(1,:),Y2(2,:),'o'); grid on;
axis([-2000 2000 -2000 2000])
Reference
[1] Shlens, Jonathon. "A tutorial on principal component analysis." arXiv preprint arXiv:1404.1100 (2014).