[R] specifying a structural equation model with sem

Tue Dec 2 18:15:55 CET 2014

Dear David,

There's nothing wrong with what you did: You've fit the independence model and a one-factor CFA model as I believe you intended. It's unnecessary to fit the first model because sem() computes the chisquare for the independence model in any event, and (as the message printed by specifyModel() indicates) it's much easier to use cfa() for the second model.

In the input and output below, I've cleaned up your code a bit:

------------------ snip --------------

> R <- scan()
1:  1.0000000 0.8360787 0.9107897
4:  0.8360787 1.0000000 0.8745305
7:  0.9107897 0.8745305 1.0000000
10: 
Read 9 items

> R <- matrix(R, 3, 3)
> names <- scan(what="")
1: mirror novel shelter
4: 
Read 3 items

> rownames(R) <- colnames(R) <- names
> R
           mirror     novel   shelter
mirror  1.0000000 0.8360787 0.9107897
novel   0.8360787 1.0000000 0.8745305
shelter 0.9107897 0.8745305 1.0000000

> model.0 <- specifyModel()
1: mirror <-> mirror, e1 
2: novel <-> novel, e2 
3: shelter <-> shelter, e3
4: 
Read 3 records
NOTE: it is generally simpler to use specifyEquations() or cfa()
      see ?specifyEquations

> (s0 <- summary(sem(model.0, R, N=235)))

 Model Chisquare =  761.9987   Df =  3 Pr(>Chisq) = 7.543436e-165
 AIC =  767.9987
 BIC =  745.62

 Normalized Residuals
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000  12.790   8.911  13.380  13.930 

 Parameter Estimates
   Estimate Std Error  z value  Pr(>|z|)                         
e1 1        0.09245003 10.81665 2.870676e-27 mirror <--> mirror  
e2 1        0.09245003 10.81665 2.870676e-27 novel <--> novel    
e3 1        0.09245003 10.81665 2.870676e-27 shelter <--> shelter

 Iterations =  0 

> s0$chisqNull
[1] 761.9987

> s0$chisq
[1] 761.9987

> model.1 <- specifyModel()
1: L -> mirror, a1
2: L -> novel, a2
3: L -> shelter, a3
4: L <->L , NA, 1
5: mirror <-> mirror, e1 
6: novel <-> novel, e2 
7: shelter <-> shelter, e3
8: 
Read 7 records
NOTE: it is generally simpler to use specifyEquations() or cfa()
      see ?specifyEquations

> summary(sem(model.1, R, N=235))

 Model Chisquare =  4.918292e-13   Df =  0 Pr(>Chisq) = NA
 AIC =  12
 BIC =  4.918292e-13

 Normalized Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
1.107e-07 1.627e-07 1.701e-07 1.696e-07 1.753e-07 2.457e-07 

 R-square for Endogenous Variables
 mirror   novel shelter 
 0.8707  0.8028  0.9527 

 Parameter Estimates
   Estimate  Std Error  z value   Pr(>|z|)                         
a1 0.9331364 0.04964952 18.794471 8.381222e-79 mirror <--- L       
a2 0.8959876 0.05105094 17.550854 5.859212e-69 novel <--- L        
a3 0.9760520 0.04790520 20.374657 2.806766e-92 shelter <--- L      
e1 0.1292564 0.01801049  7.176725 7.140093e-13 mirror <--> mirror  
e2 0.1972062 0.02206224  8.938628 3.940313e-19 novel <--> novel    
e3 0.0473225 0.01537861  3.077164 2.089802e-03 shelter <--> shelter

 Iterations =  23 

> model.cfa <- cfa(reference.indicators=FALSE) # equivalent
1: L: mirror, novel, shelter
2: 
Read 1 item
NOTE: adding 3 variances to the model

> summary(sem(model.1, R, N=235))

 Model Chisquare =  4.918292e-13   Df =  0 Pr(>Chisq) = NA
 AIC =  12
 BIC =  4.918292e-13

 Normalized Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
1.107e-07 1.627e-07 1.701e-07 1.696e-07 1.753e-07 2.457e-07 

 R-square for Endogenous Variables
 mirror   novel shelter 
 0.8707  0.8028  0.9527 

 Parameter Estimates
   Estimate  Std Error  z value   Pr(>|z|)                         
a1 0.9331364 0.04964952 18.794471 8.381222e-79 mirror <--- L       
a2 0.8959876 0.05105094 17.550854 5.859212e-69 novel <--- L        
a3 0.9760520 0.04790520 20.374657 2.806766e-92 shelter <--- L      
e1 0.1292564 0.01801049  7.176725 7.140093e-13 mirror <--> mirror  
e2 0.1972062 0.02206224  8.938628 3.940313e-19 novel <--> novel    
e3 0.0473225 0.01537861  3.077164 2.089802e-03 shelter <--> shelter

 Iterations =  23 

----------- snip -----------

Your questions really seem to pertain to the the models themselves and not to how to use sem() to fit them. For example, df = 0 in the second model because the model is just-identified (not under-identified). That's why the model chisquare is also 0 (within rounding error). I think that you should probably try to talk to someone about what you're doing.

I hope this helps,
 John

On Tue, 2 Dec 2014 11:33:36 +0100
 David Villegas Ríos <chirleu at gmail.com> wrote:
> Hi all,
> I'm new to sem package and sem analyses, so this is probably very basic,
> although I was not able to solve it myself reading some other similar
> posts. I was trying to specify a structural equation model using a
> correlation matrix of three variables. The correlation matrix comes from a
> mixed model in which repeated measures of each variable were analysed as a
> function of some fixed and random effects. All the correlations are really
> high:
> 
>                    mirror          novel       shelter
> mirror  1.0000000 0.8360787  0.9107897
> novel   0.8360787 1.0000000  0.8745305
> shelter 0.9107897 0.8745305  1.0000000
> 
> I want to test two models using sem:
> 
> 1. Independence model: the three variables are independent
> 2. Syndrome model: all the variables linked through a common latent variable
> 
> So my code is:
> 
> *# independence model*
> 
> model.0=specifyModel()
> mirror<->mirror, e1, NA
> novel<->novel, e2, NA
> shelter<->shelter, e3, NA
> 
> *# syndrome model; L represents the latent variable*
> 
> model.1=specifyModel()
> L->mirror,a1,NA
> L->novel,a2,NA
> L->shelter,a3,NA
> L<->L,NA,1
> mirror<->mirror,e1,NA
> novel<->novel,e2,NA
> shelter<->shelter,e3,NA
> 
> *Sem function:*
> 
> output0.b=sem(model.0,b,N=235)
> output1.b=sem(model.1,b,N=235)
> 
> *And my outputs are:*
> 
> > summary(output0.b)
> 
>  Model Chisquare =  761.9988   Df =  3 Pr(>Chisq) = 7.543238e-165
>  AIC =  767.9988
>  BIC =  745.62
> 
>  Normalized Residuals
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>   0.000   0.000  12.790   8.911  13.380  13.930
> 
>  Parameter Estimates
>    Estimate Std Error  z value  Pr(>|z|)
> e1 1        0.09245003 10.81665 2.870676e-27 mirror <--> mirror
> e2 1        0.09245003 10.81665 2.870676e-27 novel <--> novel
> e3 1        0.09245003 10.81665 2.870676e-27 shelter <--> shelter
> 
>  Iterations =  0
> 
> 
> > summary(output1.b)
> 
>  Model Chisquare =  3.117506e-13   Df =  0 Pr(>Chisq) = NA
>  AIC =  12
>  BIC =  3.117506e-13
> 
>  Normalized Residuals
>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
> 1.107e-07 1.627e-07 1.701e-07 1.696e-07 1.753e-07 2.457e-07
> 
>  R-square for Endogenous Variables
>  mirror   novel shelter
>  0.8707  0.8028  0.9527
> 
>  Parameter Estimates
>    Estimate   Std Error  z value   Pr(>|z|)
> a1 0.93313643 0.04964952 18.794470 8.381262e-79 mirror <--- L
> a2 0.89598763 0.05105094 17.550855 5.859107e-69 novel <--- L
> a3 0.97605200 0.04790520 20.374657 2.806748e-92 shelter <--- L
> e1 0.12925637 0.01801049  7.176726 7.140033e-13 mirror <--> mirror
> e2 0.19720615 0.02206224  8.938628 3.940334e-19 novel <--> novel
> e3 0.04732249 0.01537861  3.077164 2.089805e-03 shelter <--> shelter
> 
>  Iterations =  23
> 
> *My questions are:*
> 
> 1) Are the models properly specyfied? I followed some examples in the
> literature, specifically Broomer et al., 2014 Behav
> Ecol, doi:10.1093/beheco/aru057
> 
> 2) The outputs look pretty strange to me...first, all the paths seems
> (highly!) significant. But looking at the model fit, and which model fits
> the data better, I guess that the AIC value of the syndrome model is not
> correct, and that the Df=0...so I guess that particular model is not
> properly specified (unidentified model?). Also, in the independence model,
> all the Z values are the same (?)
> 
> 3) I specifyied N=235 since the original data consist of 235 rows for one
> of the variables (5 repeated measures of 47 individuals). But for the two
> other variables, I only have 94 rows (2 repeated measures of the same 47
> individuals). So I'm not sure which N I should specify in the sem model.
> Maybe something like N=c(94,94,235)? Or N=47 because in the end everything
> is based on 47 individuals?
> 
> Any advise at this point would be greatly appreciated, I'm a bit at loss.
> 
> David
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

------------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/