[R] prcomp() and varimax()

peter dalgaard pdalgd at gmail.com
Mon Apr 8 01:18:41 CEST 2013


On Apr 7, 2013, at 16:06 , Mike Amato wrote:

> Thanks for the reply. Maybe my problem is that prcomp() and varimax() 
> are calculating "cumulative proportion of variance" differently?
> When I use the tol parameter with prcomp(), it restricts the number of 
> components to 3 and reports that the cumulative variance explained by 
> the third component is 100%.
> But when I try to pass that 3-component analysis to varimax(), the 
> cumulative variance of the third component drops to 20%. The cumulative 
> proportion of variance explained by a component should not change 
> following rotation, so it seems like it should be either 50% (as in the 
> original 15 component model pca1) or else 75% (as in the smaller 
> unrotated model pca2). But component 3 in the rotated model (pca3) has a 
> value that is neither of those.
> 
> I suspect maybe I am not using varimax() correctly? Especially because 
> it doesn't make sense that all components in the rotated model (pca3) 
> would explain an identical amount of variance- this is real data, so the 
> first component should explain more variance than the second, and so on.
> 
> Thanks for the help,
> Mike
> 

There are some pitfalls in this area... The important issue is that the factor rotations are designed for factor analysis, not PCA, and the rotation matrix of the latter is not the loadings matrix of the former. I think the loadings matrix of PCA can be obtaind by scaling the rotations columnwise by the std.dev., but don't take my word for it.



> 
> 
> On 4/7/2013 6:38 AM, S Ellison wrote:
>>> My concern is with the reported proportions of variance for the 3
>>> components after varimax rotation. It looks like each of my 3 components
>>> explains 1/15 of the total variance, summing to a cumulative proportion
>>> of 20% of variance explained. But those 3 components I retained should
>>> now be the only components in the analysis, so they should be able to
>>> account for 100% of the explained variance.
>> Am I misreading what you just wrote? One percentage (20%) is a proportion of the total variance in the data; the other is the proportion of the variance explained by the model. These are different things; they should not usually be the same.
>> 
>> *******************************************************************
>> This email and any attachments are confidential. Any use, copying or
>> disclosure other than by the intended recipient is unauthorised. If
>> you have received this message in error, please notify the sender
>> immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com
>> and delete this message and any copies from your computer and network.
>> LGC Limited. Registered in England 2991879.
>> Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
> Hello,
> I am attempting to do a principal components analysis on 15 survey 
> items. I want to use a varimax rotation on the retained components, but 
> I am dubious of the output I am getting, and so I suspect I am doing 
> something wrong. I proceed in the following steps:
> 
>  1) use prcomp() to inspect all 15 components, and decide which to retain
>  2) run prcomp() again, using the "tol" parameter to omit unwanted 
> components
>  3) pass the output of step 2 to varimax()
> 
> My concern is with the reported proportions of variance for the 3 
> components after varimax rotation. It looks like each of my 3 components 
> explains 1/15 of the total variance, summing to a cumulative proportion 
> of 20% of variance explained. But those 3 components I retained should 
> now be the only components in the analysis, so they should be able to 
> account for 100% of the explained variance.
> 
> I am able to get reliable seeming results using principal() from the 
> "psych" package, in which the total amount of variance explained by my 
> retained components does not differ before or after rotation. But 
> principal() uses varimax(), so I suspect I am either doing something 
> wrong or misinterpreting the output when using the base package functions.
> 
> Am I doing something wrong when attempting to retain only 3 components?
> Am I using varimax() incorrectly?
> Am I misinterpreting the returned values from varimax()?
> 
> Thanks for any help,
> Mike
> 
> 
> 
> Here is a link to the data file I am using: 
> https://www.dropbox.com/s/scypebzy0nnhlwk/pca_sampledata.txt
> 
> ### step 1 ###
>> d1 = read.table("pca_sampledata.txt", T)
>> m1 = with(d1, ~ g.enjoy + g.look + g.cost + g.fit + g.health + 
> g.resale + b.withstand + b.satisfy + b.vegetated + b.everyone + b.harmed 
> + b.eco + b.ingenuity + b.security + b.proud)
>> pca1 = prcomp(m1)
>> summary(pca1) #output truncated for this posting
> Importance of components:
>                           PC1    PC2    PC3     PC4     PC5 ... PC15
> Standard deviation     1.5531 1.3064 1.1695 0.93512 0.92167 ... 0.35500
> Proportion of Variance 0.2199 0.1556 0.1247 0.07972 0.07744 ... 0.01149
> Cumulative Proportion  0.2199 0.3755 0.5002 0.57988 0.65732 ... 1.00000
> 
> 
> ### step 2 ###
>> pca2 = prcomp(m1, tol=.75)
>> summary(pca2) #full output shown
> Importance of components:
>                           PC1    PC2    PC3
> Standard deviation     1.5531 1.3064 1.1695
> Proportion of Variance 0.4397 0.3111 0.2493
> Cumulative Proportion  0.4397 0.7507 1.0000
> 
> 
> ### step 3 ###
>> pca3 = varimax(pca2$rotation)
>> pca3
>> ...
>>                 PC1   PC2   PC3
>> SS loadings    1.000 1.000 1.000
>> Proportion Var 0.067 0.067 0.067
>> Cumulative Var 0.067 0.133 0.200
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list