[R] factor analysis

Thu Jun 7 04:21:13 CEST 2007

      I haven't seen an answer to this post, so I thought I would try to 
generate a response. 

      Regarding your first question (Can i use this factor analysis 
somehow despite the poor cumulative variance of the first three factors 
?), I would ask, "for what purpose?"  And, "What are the alternatives?" 

      The second question on the null hypothesis can be answered by 
looking at the degrees of freedom:  That will often identify the null 
hypothesis for a test with a chi-square statistic.  Let p = number of 
original variables, which I assume is 10 in your case as you list 10 
eigenvalues.  Let f = number of factors = 3 in your case.  The degrees 
of freedom is the number of free parameters estimated in a model.  With 
two nested models, the degrees of freedom is the difference in the 
numbers of parameters estimated in the two models. 

      I can think of several obvious hypotheses, in this case:  The two 
extremes are that there are no significant correlations and that a 
saturated model is required.  The former requires no parameters to 
estimate the correlation matrix, while the latter requires choose(p, 2) 
= 45 with p = 10.  To estimate a model with only one factor requires p 
parameters, one for the eigenvalue and (p-1) for the eigenvector / 
direction / factor loadings.  (The sums of squares of the elements of 
each eigenvector = 1.  The factor loadings = the eigenvector times the 
square root of the corresponding eigenvalue.)  Thus, the free parameters 
for a one-factor model = 10.  If this hypothesis compared one factor to 
none, the degrees of freedom would be 10 - 0 = 10.  Similarly, if the 
null hypothesis were saturated, the degrees of freedom would be 45 - 10 
= 35. 

      Next consider a 2-factor model.  In addition to the p coefficients 
estimated for one factor, we must estimate an additional p-1, one 
eigenvalue and p-2 for a unit vector orthogonal to the one we already 
have.  This is 19 degrees of freedom.  Similarly for a 3-factor model, 
we must estimate an additional p-2 parameter, one eigenvalue plus p-3 
for a unit vector orthogonal to the two we already have.  This gives us 
19 + 8 = 27.  Finally a 4-factor model would require estimating p-3 
additional parameters for a total of 34. 

      Now compare the degrees of freedom for the 3-factor model with all 
the others just listed to find one where the difference is the number 
you got, 18.  If we do this we find that 45 - 27 = 18.  From this, we 
conclude that the null hypothesis is the saturated model, i.e., no 
factor structure identifiable from these data. 

      As a check, let's look at your 4-factor model:  45 - 34 = 11.  
This says that your 4-factor model is NOT significantly different from a 
saturated model, i.e., it is adequate.  Returning to the 3-factor model, 
the low p-value in that case says that 3 factors is not enough:  4 
factors provides a more accurate representation. 

      Does this make sense? 

      Note, however, that the above assumes your observations are all 
statistically independent.  If that's not the case, then the assumptions 
behind this test are not satisfied.  Similarly, if the observations are 
not normally distributed, you can't trust this test.  I often check 
normality using 'qqnorm'.  However, if your observations were collected 
in batches, for example, then I would not expect them to be independent. 

      Finally, even though this analysis suggest that a 4-factor model 
is better, I might still use the 3-factor model if it gave me something 
I could interpret and the 4-factor model didn't. 

      Hope this helps. 
      Spencer Graves
p.s.  I might have answered this a day or two earlier, but the lack of a 
simple, self-contained example meant that I would have to work harder to 
understand your question and craft an answer.      

bunny , lautloscrew.com wrote:
> Hi there,
>
> i´ve trouble understanding the factanal output of R.
> i am running a a FA on a dataset with 10 variables.
>
> i plotted eigenvalues to finde out how many factors to try.
> i think the "elbow" is @ 3 factors.
> here are my eigenvalues: 2.6372766 1.5137754 1.0188919 0.8986154  
> 0.8327583 0.7187473 0.6932792 0.5807489 0.5709594 0.5349477
> (of the correlation matrix)
>
> i guess this is basically what screeplot does as well.
>
> and here´s my problem:
> unfortunately the cumulative variance @ 3 factors is only .357
> there are no crossloadings and the interpretation of the factors and  
> their loadings definetely make sense so far.
>
> Can i use this factor analysis somehow despite the poor cumulative  
> variance of the first three factors ?
> changing the rotation didnt help much.
>
> The test of the hypothesis says the following:
>
> Test of the hypothesis that 3 factors are sufficient.
> The chi square statistic is 46.58 on 18 degrees of freedom.
> The p-value is 0.000244
>
> does this mean the Hnull is that 3 factors are sufficient and i cant  
> recject ?
>
>
> 4 factors say:
> Test of the hypothesis that 4 factors are sufficient.
> The chi square statistic is 10.82 on 11 degrees of freedom.
> The p-value is 0.458
>
> Unfortunately ?factanal does not tell me what the Hnull is in this  
> case ?
>
> Thx a lot in advance for some advice
>
> matthias
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>