[R] Note on PCA (not directly with R)

Nikos Alexandris nikos.alexandris at felis.uni-freiburg.de
Thu Jul 1 04:51:03 CEST 2010


Christofer Bogaso wrote:
> Dear all, I am looking for some interactive study materials on Principal
> component analysis. Basically I would like to know what we are actually
> doing with PCA?

Having in mind the eigenvalue decomposition and a bivariate data set, the sum-
it-all in a few sentences I think is:

- PCA rotates (and scales) the data set in such a way that aligns the 
thransformed axes (the principal components) with the direction(s) of maximum 
variance

- eigen values are proportional to the lentgh of the axes of variation

- eigen (or characteristic) vectors define the rotation


> What is happening within the dataset at the time of doing
> PCA.

The algorithm (classically):

- mean-centers the data matrix

- calculates the covariance matrix (non-standardised PCA) or the correlation 
matrix (standartised PCA, a step also known as scaling)

- calculates the the eigenvalue decomposition (EVD) (the eigenvectors and 
eigenvalues) of a data variance-covariance (non-standardised) or the 
correlation matrix (standardised)

- sorts the variances (i.e. the eigenvalues) in decreasing order and finally  
projects the original dataset signals into what is named Principal Components 
or scores, by multiplying them with the eigenvectors which act as weighting 
coefficients.


The algorithm does actually three (or more?) things:

- minimises the mean square error of approximating the original data set,
- keeps the maximum possible variance(s) of the original data set,
- gives decorrelated variables


> Probably a 3-dimensional interactive explanation would be best for me.
> I have gone through some online materials specially Wikipedia etc, however
> what I need a "movable explanation" to understand that.
> 
> Any suggestion please?

For what is worth, I think a 2-dimensional example is better to start with. 
You can have a look at the plotpc() package. It really is educational.

Good luck, Nikos



More information about the R-help mailing list