[R] Tetrachoric correlation in R vs. stata

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sun Jun 25 11:41:52 CEST 2006


"Gary Collins" <collins.gs at gmail.com> writes:

> looking at the help page/code in STATA for tetrachoric, it says it
> estimates the tetrachoric correlation via the approximation suggested
> by Edwards & Edwards (1984), "Approximating the tetrachoric
> correlation", Biometrics, 40(2): 563.
> 
> that is,
> 
> (alpha (pi/4) - 1) / (alpha^(pi/4)+1), where alpha is ad/bc
> 
> i.e.
> > alpha=(522 * 22)/(34 * 54)
> > (alpha^(pi/4)-1) / (alpha^(pi/4)+1)
> [1] 0.6168851


...and the approximation is obviously quite far off the mark in this
case. Presumably (I'm lazy) the approximation holds for the odds ratio
alpha close to 1 (rho close to 0) and/or marginal distributions close
to 50:50.

There's a Stata package "polychoric" which claims to do things more
accurately, referred to at

http://www.ats.ucla.edu/STAT/stata/faq/tetrac.htm 

(I believe I mentioned this before, but possibly in a private mail to
Janet which never reached r-help).
 
> HTH
> 
> Gary
> 
> On 25/06/06, John Fox <jfox at mcmaster.ca> wrote:
> > Dear Janet,
> >
> > A good thing to do when different software gives different answers is
> > to check each against known results. I'm away from home, and don't have
> > all of the examples that I used to check polychor(), but I dug up the
> > following. The polychor() function produces output that agrees with
> > both of these sources. How does Stata do?
> >
> > > # example from Drasgow (1988), pp. 69-74 in Kotz and Johnson,
> > > #  Encyclopedia of statistical sciences. Vol. 7.
> > > tab
> >      [,1] [,2] [,3]
> > [1,]   58   52    1
> > [2,]   26   58    3
> > [3,]    8   12    9
> >
> > > polychor(tab, std.err=TRUE)
> >
> > Polychoric Correlation, 2-step est. = 0.42 (0.07474)
> > Test of bivariate normality: Chisquare = 11.55, df = 3, p = 0.009078
> >
> > > polychor(tab, ML=TRUE, std.err=TRUE)
> >
> > Polychoric Correlation, ML est. = 0.4191 (0.07616)
> > Test of bivariate normality: Chisquare = 11.54, df = 3, p = 0.009157
> >
> >   Row Thresholds
> >   Threshold Std.Err.
> > 1  -0.02988  0.08299
> > 2   1.13300  0.10630
> >
> >
> >   Column Thresholds
> >   Threshold Std.Err.
> > 1   -0.2422  0.08361
> > 2    1.5940  0.13720
> >
> > > tab # example from Brown (1977) Applied Statistics, 26:343-351.
> >      [,1] [,2]
> > [1,] 1562   42
> > [2,]  383   94
> >
> > > polychor(tab)
> > [1] 0.595824
> > >
> >
> > Regards,
> >  John
> >
> > On Fri, 23 Jun 2006 14:33:31 -0700
> >  Janet Rosenbaum <jrosenba at rand.org> wrote:
> > > Peter --- Thanks for pointing out the omitted information.  The
> > > hazards
> > > of attempting to be brief.
> > >
> > > In R, I am using polychor(vec1, vec2, std.err=T) and have used both
> > > the
> > > ML and 2 step estimates, which give virtually identical answers.  I
> > > am
> > > explicitly using only the 632 complete cases in R to make sure
> > > missing
> > > data is handled the same way as in stata.
> > >
> > > Here's my data:
> > >
> > > 522   54
> > > 34    22
> > >
> > > > polychor(v1, v2, std.err=T, ML=T)
> > >
> > > Polychoric Correlation, ML est. = 0.5172 (0.08048)
> > > Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN
> > >
> > >     Row Thresholds
> > >     Threshold Std.Err.
> > >   1     1.349  0.07042
> > >
> > >
> > >     Column Thresholds
> > >     Threshold Std.Err.
> > >   1     1.174  0.06458
> > >   Warning message:
> > >   NaNs produced in: pchisq(q, df, lower.tail, log.p)
> > >
> > > In stata, I get:
> > >
> > > . tetrachoric t1_v19a ct1_ix17
> > >
> > > Tetrachoric correlations (N=632)
> > >
> > > ----------------------------------
> > >      Variable |  t1_v19a  ct1_ix17
> > > -------------+--------------------
> > >       t1_v19a |        1
> > >      ct1_ix17 |    .6169         1
> > > ----------------------------------
> > >
> > > Thanks for your help.
> > >
> > > Janet
> > >
> > >
> > >
> > > Peter Dalgaard wrote:
> > > > Janet Rosenbaum <jrosenba at rand.org> writes:
> > > >
> > > >> I hope someone here knows the answer to this since it will save me
> > > from
> > > >> delving deep into documentation.
> > > >>
> > > >> Based on 22 pairs of vectors, I have noticed that tetrachoric
> > > >> correlation coefficients in stata are almost uniformly higher than
> > > those
> > > >> in R, sometimes dramatically so (TCC=.61 in stata, .51 in R;  .51
> > > in
> > > >> stata, .39 in R).  Stata's estimate is higher than R's in 20 out
> > > of 22
> > > >> computations, although the estimates always fall within the 95% CI
> > > for
> > > >> the TCC calculated by R.
> > > >>
> > > >> Do stata and R calculate TCC in dramatically different ways?  Is
> > > the
> > > >> handling of missing data perhaps different?  Any thoughts?
> > > >>
> > > >> Btw, I am sending this question only to the R-help list.
> > > >
> > > >
> > > > A bit more information seems necessary:
> > > >
> > > > - tetrachoric correlations depend on 4 numbers, so you should be
> > > able
> > > >   to give a direct example
> > > >
> > > > - you're not telling us how you calculate the TCC in R. This is not
> > > >   obvious (package polycor?).
> > > >
> > >
> > >
> > > --------------------
> > >
> > > This email message is for the sole use of the intended\ > ...{{dropped}}
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list