[R] Difficulty understanding sem errors / failed confirmatory factor analysis

Thu Sep 18 20:11:33 CEST 2008

Dear John,

On Thu, 18 Sep 2008, John Fox wrote:

>>  	I'm trying to fit a pretty simple confirmatory factor analysis using
>> the sem package. There's a CFA example in the examples, which is helpful,
>> but the output for my (failing) model is hard to understand. I'd be
>> interested in any other ways to do a CFA in R, if this proves
>> troublesome.
>>
>>  	The CFA is replicating a 5 uncorrelated-factor structure (for those
>> interested, it is a structure of word usage patterns in weblogs) in a
>> special population. The model looks like model.txt (attached as many
>> people hate long emails); the correlation matrix cors.txt as well.
>
> As far as I can see, the attachments aren't there. If you like, you can
> send them to me privately. Without the input covariance matrix and your
> model, it's very hard to tell what the source of the problem is, but one
> guess (assuming that you've specified the model correctly) is that the
> assumption of uncorrelated factors is too far off. Also see below.

I have pasted the matrix into another email; apologies for failing to attach
them acceptably before.

I also augmented the model to allow the factors to correlate, by adding
these lines to the model:

Melancholy <-> Social, Soc.Mel, NA
Melancholy <-> Rant, Rant.Mel, NA
Melancholy <-> Work, Work.Mel, NA
Melancholy <-> Metaphysical, Meta.Mel, NA
Social <-> Rant, Soc.Rant, NA
Social <-> Work, Soc.Work, NA
Social <-> Metaphysical, Soc.Meta, NA
Rant <-> Work, Rant.Work, NA
Rant <-> Metaphysical, Rant.Meta, NA
Work <-> Metaphysical, Work.Meta, NA

...and obtain the same errors.

>>
>>  	I'm setting no overlap between factors, no correlation between
>> factors, and estimating a separate variance for each observed variable
>> (which should be everything on the right-hand side of the -> arrows), but
>> setting the factor variances equal to 1...pretty standard. I've ensured
> that
>> everything is typed correctly to the best I am able.
>>
>>  	The problem:
>>
>> library(sem)
>> model.kr <- specify.model(file="model.txt") # printing it checks out ok
>> correl <- read.csv("cors.csv", header=TRUE) # printing it checks out ok
>> kr.sem <- sem(ram=model.kr,S=correl,N=3034)
>> ...about 10 seconds pass...
>> Warning message:
>> In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names =
> vars,
>> :
>>    Could not compute QR decomposition of Hessian.
>> Optimization probably did not converge.
>>
>> (running qr on correl works fine; randomly-generated correl matrices fail
> in
>> the same way; I do not know how to further troubleshoot this)
>
> Doing a QR decomposition on the correlation matrix of the data is
> essentially irrelevant. The issue is the Hessian. (The scaled inverse
> Hessian is the covariance matrix of the parameter estimates, not of the
> data.) That you observe similar problems for randomly generated covariance
> matrices may or may not be troublesome, depending upon how you generated
> them.

df <- as.data.frame(matrix(rnorm(3034*24),nrow=3034,ncol=24))
df.cor <- cor(df)
rownames(df.cor) <- colnames(df.cor) <- colnames(correl)
sem.df <- sem(model.kr, df.cor, 3034)

...which now does not throw errors with the new model, even though that syntax
was copied from my .Rhistory. I think I may have gotten unlucky with random
data the first time.

Thanks for the info on what the error message means, though--I was largely
in the dark on that.

>> ...and then the model itself (which is produced, as the above was just a
>> warning):
>>
>> summary(kr.sem)
>> Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))),
> par.code) :
>>    arguments imply differing number of rows: 47, 0
>
> If the Hessian isn't positive-definite, it won't be possible to get
> estimated coefficient standard errors. I suspect that this is the source
> of this error message. If so, it would be better for summary.sem() to
> provide a more informative error message.

This makes sense. It may also be useful for the sem() function to throw an
error rather than a warning if the Hessian matrix cannot be decomposed,
perhaps? How often is an SEM model without estimated coefficient standard
errors desirable?

Thanks again for the assistance. I think the trouble may now be in my
correlation matrix; I will play around with my model and see whether
something else is more reasonable.

--Adam