[R] CFA with lavaan or with SEM

yrosseel yrosseel at gmail.com
Fri Jan 25 17:54:15 CET 2013

> I am trying to use the cfa command in the lavaan package to run a CFA
> however I am unsure over a couple of issues.
> I have @25 dichotomous variables, 300 observations and an EFA on a
> training dataset suggests a 3 factor model.

That is a lot of variables, and a rather small sample size (for binary 

> After defining the model I use the command
> fit.dat <- cfa(model.1, data=my.dat, std.lv = T, estimator="WLSMV",
> ordered=c("var1","var2" and so on for the other 23 variables))

To avoid having to type "var?" 25 times, you can say


> Is it right that I define the variables as ordered (the output
> returns thresholds suggesting I should).


Does the cfa command
> calculate tetrachoric correlations in the background?

Yes, indeed. You can 'see' it by typing

inspect(fit, "sampstat")

lavaan also computes an asymptotic variance matrix of these 
correlations, so you should get correct standard errors and a correct 
test statistic. By default, lavaan will provide robust standard errors 
and a mean and variance adjusted test statistic (estimator="WLSMV").

> However, output for the command returns two variables with  small
> negative variances (-0.002) which I think is due to the correlation
> matrix not being positive definite.  Is it reasonable to force these
> to be zero when defining the model or is this more a sign of problems
> with the model?

You can NOT force these to be equal (at least not in the current version 
of lavaan - 0.5-11, where the residual variance is a function of other 
model parameters). I don't think this is caused by a non-pd correlation 
matrix (you should get a big warning if this was the case). Perhaps the 
sample size is too small. Could you remove some items, or regroup them?

> As an alternative is it possible to calculate the tetrachoric
> correlations using hetcor (which applies smoothing) and then use the
> smoothed sample correlation as the input to the model, such as
> fit.cor <- cfa(model.1, sample.cov=my.hetcor, sample.nobs=300, std.lv
> = T,estimator="ML", ordered=c("var1","var2" and so on for the other
> 23 variables)).

This will work only if you omit the 'ordered' argument. Perhaps in 
combination with estimator="ULS". But do not trust/report the standard 
errors in this case.

> Final question is I have a lot of missing data - listwise deletion
> leaves 90 subjects. Is there a way to calculate estimates using
> pairwise deletion (this is another reason why I tried using the
> correlation matrix as the input).

You could do this, and use estimator="ULS". But again, you can not use 
the standard errors.


More information about the R-help mailing list