[R] factor analysis (pca): how to get the 'communalities'?
ripley@stats.ox.ac.uk
ripley at stats.ox.ac.uk
Fri Jan 3 09:14:03 CET 2003
On Fri, 3 Jan 2003, Wolfgang Lindner wrote:
> I try some test data for a factorAnalysis (resp. pca) in the sense of Prof.
Well, factor analysis and pca are different things, and only one
is appropriate in a given problem.
> Ripley's MASS § 11.1, p. 330 ff.,
Eh? Would that be *Venables & Ripley's* MASS, and if so which edition (it
is not the current one). Those editions which cover factor analysis do
explain the difference.
>just to prepare myself for an analysis of my
> own empirical data using R (instead of SPSS).
>
> 1. the data.
>
> ## The test data is (from the book of Backhaus et al.: Multivariate ##
> Analysemethoden. Springer 2000 [9th ed.], p. 300 ff):
>
> a<-c(4.5,5.167,5.059,3.8,3.444,3.5,5.25,5.857,5.083,5.273,4.5)
> b<-c(4.0,4.25,3.824,5.4,5.056,3.5,3.417,4.429,4.083,3.6,4.0)
> c<-c(4.375,3.833,4.765,3.8,3.778,3.875,4.583,4.929,4.667,3.909,4.2)
> d<-c(3.875,3.833,3.438,2.4,3.765,4.0,3.917,3.857,4.0,4.091,3.9)
> e<-c(3.25,2.167,4.235,5.0,3.944,4.625,4.333,4.071,4.0,4.091,3.7)
> f<-c(3.75,3.75,4.471,5.0,5.389,5.250,4.417,5.071,4.25,4.091,3.9)
> g<-c(4.0,3.273,3.765,5.0,5.056,5.5,4.667,2.929,3.818,4.545,3.6)
> h<-c(2.0,1.857,1.923,4.0,5.615,6.0,3.25,2.091,1.545,1.6,1.5)
> i<-c(4.625,3.75,3.529,4.0,4.222,4.75,4.5,4.571,3.75,3.909,3.5)
> j<-c(4.125,3.417,3.529,4.6,5.278,5.375,3.583,3.786,4.167,3.818,3.7)
>
> m<-data.frame(a,b,c,d,e,f,g,h,i,j)
>
> 2. My try of a pca with R.
>
> ## My R input was:
>
> m
> cor(m)
> library(mva)
> m.pca<-princomp(m,cor=T)
> m.pca
> summary(m.pca)
> loadings(m.pca)
> m.pca$scores
> m.FA <- factanal(factors = 3, covmat=cov(m))
> m.FA
>
> 3. Here are my questions.
>
> Q1.
> The cor(m)-Matrix is the same as reported by using SPSS (or OpenStats2).
> But in R I get other eigenvalues compared with the following SPSS output:
You don't get eigenvalues at all in R. You do get `Proportion of
Variance' which are these numbers divided by their total.
> Original matrix trace = 10,00
> Roots (Eigenvalues) Extracted:
> 1 5,052
> 2 1,771
> 3 1,427
> 4 0,819
> 5 0,430
> 6 0,247
> 7 0,159
> 8 0,062
> 9 0,029
> 10 0,003
>
> - What is going behind the scene?
Why don't you ask the SPSS people that? R at least gives you sensible
labels on the output.
> - Or what I am doing wrong in my use of R?
> - If I am doing the pca correct, can I use the R results as equally aceptable
> without further discussion?
No, as more acceptable: at least they have meaningful labels.
> Maybe a different 'hidden' algorithm is the reason for different results?
Ask SPSS that. R's code is open, and nothing is hidden. You have not
demonstrated that the results are different, anyway!
> Q2. How to get the so called 'Communality Estimates' with R?
First, use the data as in
> (m.FA <- factanal(m, factors=3))
and where did the number of factors come from?
100*(1 - m.FA$uniquenesses) gives the communalities. They are different
from SPSS, because (1) R uses maximum likelihood FA and (2) tries a lot
harder to find a maximum and there are many local maxima in most FA
problems.
In this case you have fitted too many factors, and just one suffices.
> Here the values reported by SPSS for the above test data.frame m:
> Communality Estimates as percentages:
> 1 88,619
> 2 76,855
> 3 89,167
> 4 85,324
> 5 76,043
> 6 84,012
> 7 80,223
> 8 92,668
> 9 63,297
> 10 88,786
>
> Any help, suggestions or hints are very welcome.
1) Be a lot more accurate.
2) Read the help pages to find out what the output means. In the case of
R the information is there, but you may well have to post on an SPSS help
list to find out why SPSS gives different output from R.
3) Don't believe SPSS knows what it is doing.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list