[R] Difficulty understanding sem errors / failed confirmatory factor analysis

Thu Sep 18 20:21:19 CEST 2008

Dear Adam,

I'm afraid that our emails have crossed. Please see my previous message.

John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: Adam D. I. Kramer [mailto:adik at ilovebacon.org]
> Sent: September-18-08 2:12 PM
> To: John Fox
> Cc: r-help at r-project.org
> Subject: Re: [R] Difficulty understanding sem errors / failed confirmatory
> factor analysis
> 
> Dear John,
> 
> On Thu, 18 Sep 2008, John Fox wrote:
> 
> >>  	I'm trying to fit a pretty simple confirmatory factor analysis using
> >> the sem package. There's a CFA example in the examples, which is
helpful,
> >> but the output for my (failing) model is hard to understand. I'd be
> >> interested in any other ways to do a CFA in R, if this proves
> >> troublesome.
> >>
> >>  	The CFA is replicating a 5 uncorrelated-factor structure (for those
> >> interested, it is a structure of word usage patterns in weblogs) in a
> >> special population. The model looks like model.txt (attached as many
> >> people hate long emails); the correlation matrix cors.txt as well.
> >
> > As far as I can see, the attachments aren't there. If you like, you can
> > send them to me privately. Without the input covariance matrix and your
> > model, it's very hard to tell what the source of the problem is, but one
> > guess (assuming that you've specified the model correctly) is that the
> > assumption of uncorrelated factors is too far off. Also see below.
> 
> I have pasted the matrix into another email; apologies for failing to
attach
> them acceptably before.
> 
> I also augmented the model to allow the factors to correlate, by adding
> these lines to the model:
> 
> Melancholy <-> Social, Soc.Mel, NA
> Melancholy <-> Rant, Rant.Mel, NA
> Melancholy <-> Work, Work.Mel, NA
> Melancholy <-> Metaphysical, Meta.Mel, NA
> Social <-> Rant, Soc.Rant, NA
> Social <-> Work, Soc.Work, NA
> Social <-> Metaphysical, Soc.Meta, NA
> Rant <-> Work, Rant.Work, NA
> Rant <-> Metaphysical, Rant.Meta, NA
> Work <-> Metaphysical, Work.Meta, NA
> 
> ...and obtain the same errors.
> 
> >>
> >>  	I'm setting no overlap between factors, no correlation between
> >> factors, and estimating a separate variance for each observed variable
> >> (which should be everything on the right-hand side of the -> arrows),
but
> >> setting the factor variances equal to 1...pretty standard. I've ensured
> > that
> >> everything is typed correctly to the best I am able.
> >>
> >>  	The problem:
> >>
> >> library(sem)
> >> model.kr <- specify.model(file="model.txt") # printing it checks out ok
> >> correl <- read.csv("cors.csv", header=TRUE) # printing it checks out ok
> >> kr.sem <- sem(ram=model.kr,S=correl,N=3034)
> >> ...about 10 seconds pass...
> >> Warning message:
> >> In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names =
> > vars,
> >> :
> >>    Could not compute QR decomposition of Hessian.
> >> Optimization probably did not converge.
> >>
> >> (running qr on correl works fine; randomly-generated correl matrices
fail
> > in
> >> the same way; I do not know how to further troubleshoot this)
> >
> > Doing a QR decomposition on the correlation matrix of the data is
> > essentially irrelevant. The issue is the Hessian. (The scaled inverse
> > Hessian is the covariance matrix of the parameter estimates, not of the
> > data.) That you observe similar problems for randomly generated
covariance
> > matrices may or may not be troublesome, depending upon how you generated
> > them.
> 
> df <- as.data.frame(matrix(rnorm(3034*24),nrow=3034,ncol=24))
> df.cor <- cor(df)
> rownames(df.cor) <- colnames(df.cor) <- colnames(correl)
> sem.df <- sem(model.kr, df.cor, 3034)
> 
> ...which now does not throw errors with the new model, even though that
> syntax
> was copied from my .Rhistory. I think I may have gotten unlucky with
random
> data the first time.
> 
> Thanks for the info on what the error message means, though--I was largely
> in the dark on that.
> 
> >> ...and then the model itself (which is produced, as the above was just
a
> >> warning):
> >>
> >> summary(kr.sem)
> >> Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))),
> > par.code) :
> >>    arguments imply differing number of rows: 47, 0
> >
> > If the Hessian isn't positive-definite, it won't be possible to get
> > estimated coefficient standard errors. I suspect that this is the source
> > of this error message. If so, it would be better for summary.sem() to
> > provide a more informative error message.
> 
> This makes sense. It may also be useful for the sem() function to throw an
> error rather than a warning if the Hessian matrix cannot be decomposed,
> perhaps? How often is an SEM model without estimated coefficient standard
> errors desirable?
> 
> Thanks again for the assistance. I think the trouble may now be in my
> correlation matrix; I will play around with my model and see whether
> something else is more reasonable.
> 
> --Adam