[R] Several PCA questions...

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jun 29 12:36:01 CEST 2004


On Tue, 29 Jun 2004, Dan Bolser wrote:

> Hi, I am doing PCA on several columns of data in a data.frame.
> 
> I am interested in particular rows of data which may have a particular
> combination of 'types' of column values (without any pre-conception of
> what they may be).
> 
> I do the following...
> 
> # My data table.
> allDat <- read.table("big_select_thresh_5", header=1)
> 
> # Where some rows look like this...
> # PDB     SUNID1  SUNID2  AA      CH      IPCA    PCA     IBB     BB
> # 3sdh    14984   14985   6       10      24      24      93      116
> # 3hbi    14986   14987   6       10      20      22      94      117
> # 4sdh    14988   14989   6       10      20      20      104     122
> 
> # NB First three columns = row ID, last 6 = variables
> 
> attach(allDat)
> 
> # My columns of interest (variables).
> part <- data.frame(AA,CH,IPCA,PCA,IBB,BB)
> 
> pc <- princomp(part)

Do you really want an unscaled PCA on that data set?  Looks unlikely (but 
then two of the columns are constant in the sample, which is also 
worrying).

> plot(pc)
> 
> The above plot shows that 95% of the variance is due to the first
> 'Component' (which I assume is AA).

No, it is the first (principal) component.  You did ask for P>C<A!

> i.e. All the variables behave in quite much the same way.

Or you failed to scale the data so one dominates.

> I then did ...
> 
> 
> biplot(pc)
> 
> Which showed some outliers with a numeric ID - How do I get back my old 3
> part ID used in allDat?

Set row names on your data frame.  Like almost all of R, it is the row 
names of a data frame that are used for labelling, and you did not give 
any so you got numbers.

> In the above plot I saw all the variables (correctly named) pointing in
> more or less the same direction (as shown by the variance). I then did the
> following...
> 
> postscript(file="test.ps",paper="a4")
> 
> biplot(pc)
> 
> dev.off()
> 
> However, looking at test.ps shows that the arrows are missing (using
> ggv)... Hmmm, they come back when I pstoimg then xv... never mind.

So ggv is unreliable, perhaps cannot cope with colours?

> Finally, I would like to make a contour plot of the above biplot, is this
> possible? (or even a good way to present the data?

What do you propose to represent by the contours?  Biplots have a 
well-defined interpretation in terms of distances and angles.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list