[R] Path Analysis

Tue Nov 5 21:16:58 CET 2013

Dear Sarah,

As you know, our discussion continued off-list, and I'm glad that you were
able to get the software to work.

I'll address your question briefly, but what I have to say probably isn't
what you want to hear:

Most fundamentally, the information you've provided is entirely without
content. That is, variable names like x1 and y1 convey no information about
the substance of the data. It's therefore impossible to know whether the
model that you specified is sensible. I think that you'd do much better to
seek competent statistical help locally than to ask questions on an email
list devoted to statistical software.

That said, you've specified a very restrictive model for the data. You could
add 8 paths to the model and still have a fully recursive model. For
example, your model specifies that x2 can only influence y4 indirectly
through y1. If you've carefully specified the model and believe, for
example, that the missing paths are implausible, that x1 and x2 are really
exogenous, and that all of the disturbances are uncorrelated, then the
correct conclusion is that your model is wrong. You could try adding the
missing paths to the model, but if you're willing to do this that would
suggest that you didn't think carefully enough about the specification in
the first place. In my opinion, structural-equation modeling shouldn't be
regarded as an exploratory method.

Of course, in a very large sample, an overidentified model that's trivially
wrong can be rejected when tested as a hypothesis. I don't know how large
your sample is, but the various "fit indices" are not encouraging. Your
model isn't just trivially wrong. Moreover, the R^2s for the endogenous
variable are very small -- two are effectively 0.

I can't judge whether your model makes any sense, but it's my impression
that most structural equation models don't. People often think that SEMs are
magic wands that can be waved over observational data to draw causal
inferences, even when the assumptions underlying the model, such as
exogeneity, are implausible, and without attending to aspects of the model,
such as potential nonlinearity, that should be part of careful regression
modeling.

My two cents,
 John

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Sarah Rogers
> Sent: Tuesday, November 05, 2013 8:45 AM
> To: r-help at r-project.org
> Cc: John Fox
> Subject: Re: [R] Path Analysis
> 
> Dear John,
> Thanks for your help. I run the path analysis but the model does not
> fit
> the data. I am in doubt if this reflects the model construction et al.
> (too
> many variables or more needed, more paths or change in direction of
> paths,
> sample size, etc) or it could be that there is an error-variance
> I have all observed data (fully recursive model), two exogenous
> variables
> (with no variance or covariance parameters), four exogenous variables,
> and
> for the final sem() model I used data argument instead of a moment
> matrix
> with the covariance symmetric matrix. What would you suggest to be the
> best
> way to investigate this in R?
> Here attached the script and results:
> 
> library(sem)
> model.xdata<-specifyEquations()
> 
> y1=xy21*x2
> y2=xy12*x1 + yy12*y1
> y3=yy23*y2
> y4=yy24*y2+yy34*y3
> 
> model.xdata.sem <- sem(model.xdata, data=xdata, fixed.x=c("x1", "x2") )
> summary(model.xdata.sem,fit.indices=c("CFI","NFI", "GFI", "RMSEA",
> "AGFI",
> "NNFI", "SRMR"))
> 
> Model Chisquare =  41.03029   Df =  8 Pr(>Chisq) = 2.057595e-06
> Goodness-of-fit index =  0.8604332
>  Adjusted goodness-of-fit index =  0.6336373
>  RMSEA index =  0.2330797   90% CI: (0.1654494, 0.3060134)
> Bentler-Bonett NFI =  0.4290901
>  Tucker-Lewis NNFI =  -0.08903999
>  Bentler CFI =  0.4191787
>  SRMR =  0.1472905
> 
>  Normalized Residuals
>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
> -3.91600 -0.60120  0.00000 -0.09444  0.13940  2.71400
> 
>  R-square for Endogenous Variables
>     y1     y2     y3     y4
> 0.0009 0.1890 0.0019 0.1558
> 
>  Parameter Estimates
>       Estimate      Std Error    z value     Pr(>|z|)
> xy21    0.017817121  0.066762981  0.26687127 7.895683e-01 y1 <--- x2
> xy12   -0.030928721  0.007447431 -4.15293810 3.282335e-05 y2 <--- x1
> yy12    0.311216816  0.475353649  0.65470585 5.126572e-01 y2 <--- y1
> yy23   -0.077701789  0.203130269 -0.38252196 7.020742e-01 y3 <--- y2
> yy24    0.002539283  0.031323241  0.08106706 9.353886e-01 y4 <--- y2
> yy34    0.066168523  0.017671263  3.74441396 1.808153e-04 y4 <--- y3
> V[y1]   1.945406949  0.315586680  6.16441400 7.074463e-10 y1 <--> y1
> V[y2]  33.438573159  5.424452858  6.16441400 7.074463e-10 y2 <--> y2
> V[y3] 129.295382082 20.974480627  6.16441400 7.074463e-10 y3 <--> y3
> V[y4]   3.068539923  0.497782907  6.16441400 7.074463e-10 y4 <--> y4
> 
> 
> 
> 
> On 2 November 2013 19:48, John Fox <jfox at mcmaster.ca> wrote:
> 
> > Dear Sarah,
> >
> > It's generally a good idea to include a reproducible example if you
> want
> > to get help with a problem, but in this case it's a safe bet that the
> > problem is that the model you specified has no variance or covariance
> > parameters for the variables x1 and x2, which, I assume, you mean to
> be
> > exogenous. The easiest way to include these variances and covariance
> in the
> > model is to specify the argument fixed.x=c("x1", "x2") in the call to
> sem().
> >
> > In addition:
> >
> > (1) Your model is fully recursive (guessing that all the x's and y's
> are
> > observed variables), and so it amounts to four OLS regressions. You
> could
> > just use lm() to fit the model.
> >
> > (2) It's generally easier in the sem package to use
> specifyEquations()
> > than specifyModel() for model specification.
> >
> > (3) If you have the original data set, as you do, it's generally
> > preferable to use the data argument to sem() than to pass it the
> covariance
> > matrix for the observed variables.
> >
> > I hope that this helps,
> >  John
> >
> >
> > ------------------------------------------------
> > John Fox
> > McMaster University
> > Hamilton, Ontario, Canada
> > http://socserv.mcmaster.ca/jfox/
> >
> > On Sat, 2 Nov 2013 11:02:31 +0100
> >  Sarah Rogers <rogerssarah65 at gmail.com> wrote:
> > >  Hello,
> > >
> > > I have just started to work on a path analysis (see attached image
> for
> > the
> > > diagram), but I have encountered an error message.
> > >
> > >
> > >
> > >
> > > This is the code I have used:
> > >
> > > cov_matrix<-var(xdata)
> > >
> > > library(sem)
> > > model.xdata<-specifyModel()
> > > x1 -> y2, xy12, NA
> > > x2 -> y1, xy21, NA
> > > y1 -> y2, yy12, NA
> > > y2 -> y3, yy23, NA
> > > y2 -> y4, yy24, NA
> > > y3 -> y4, yy34, NA
> > > y2 <-> y2, y2error, NA
> > > y1 <-> y1, y1error, NA
> > > y3 <-> y3, y3error, NA
> > > y4 <-> y4, y4error, NA
> > >
> > > model.xdata.sem <- sem(model.xdata, cov_matrix, nrow(xdata))
> > >
> > > and the error message is:
> > > Error in csem(model = model.description, start, opt.flag = 1,
> typsize =
> > > typsize,  :
> > >   The matrix is non-invertable.
> > >
> > > I fear to have a problem in the data.
> > > I would be very grateful if you could help me to solve this problem
> and
> > > proceed with my analyses.
> > >
> > > thank you in advance for your help!
> > > Sarah
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.