[R] Tetrachoric correlation in R vs. stata

John Fox jfox at mcmaster.ca
Sun Jun 25 08:26:03 CEST 2006

Dear Janet,

A good thing to do when different software gives different answers is
to check each against known results. I'm away from home, and don't have
all of the examples that I used to check polychor(), but I dug up the
following. The polychor() function produces output that agrees with
both of these sources. How does Stata do?

> # example from Drasgow (1988), pp. 69-74 in Kotz and Johnson,
> #  Encyclopedia of statistical sciences. Vol. 7.
> tab
     [,1] [,2] [,3]
[1,]   58   52    1
[2,]   26   58    3
[3,]    8   12    9
> polychor(tab, std.err=TRUE)

Polychoric Correlation, 2-step est. = 0.42 (0.07474)
Test of bivariate normality: Chisquare = 11.55, df = 3, p = 0.009078

> polychor(tab, ML=TRUE, std.err=TRUE)

Polychoric Correlation, ML est. = 0.4191 (0.07616)
Test of bivariate normality: Chisquare = 11.54, df = 3, p = 0.009157

  Row Thresholds
  Threshold Std.Err.
1  -0.02988  0.08299
2   1.13300  0.10630

  Column Thresholds
  Threshold Std.Err.
1   -0.2422  0.08361
2    1.5940  0.13720
> tab # example from Brown (1977) Applied Statistics, 26:343-351. 
     [,1] [,2]
[1,] 1562   42
[2,]  383   94

> polychor(tab)    
[1] 0.595824


On Fri, 23 Jun 2006 14:33:31 -0700
 Janet Rosenbaum <jrosenba at rand.org> wrote:
> Peter --- Thanks for pointing out the omitted information.  The
> hazards 
> of attempting to be brief.
> In R, I am using polychor(vec1, vec2, std.err=T) and have used both
> the 
> ML and 2 step estimates, which give virtually identical answers.  I
> am 
> explicitly using only the 632 complete cases in R to make sure
> missing 
> data is handled the same way as in stata.
> Here's my data:
> 522	54
> 34	22
> > polychor(v1, v2, std.err=T, ML=T)
> Polychoric Correlation, ML est. = 0.5172 (0.08048)
> Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN
>     Row Thresholds
>     Threshold Std.Err.
>   1     1.349  0.07042
>     Column Thresholds
>     Threshold Std.Err.
>   1     1.174  0.06458
>   Warning message:
>   NaNs produced in: pchisq(q, df, lower.tail, log.p)
> In stata, I get:
> . tetrachoric t1_v19a ct1_ix17
> Tetrachoric correlations (N=632)
> ----------------------------------
>      Variable |  t1_v19a  ct1_ix17
> -------------+--------------------
>       t1_v19a |        1
>      ct1_ix17 |    .6169         1
> ----------------------------------
> Thanks for your help.
> Janet
> Peter Dalgaard wrote:
> > Janet Rosenbaum <jrosenba at rand.org> writes:
> > 
> >> I hope someone here knows the answer to this since it will save me
> from 
> >> delving deep into documentation.
> >>
> >> Based on 22 pairs of vectors, I have noticed that tetrachoric 
> >> correlation coefficients in stata are almost uniformly higher than
> those 
> >> in R, sometimes dramatically so (TCC=.61 in stata, .51 in R;  .51
> in 
> >> stata, .39 in R).  Stata's estimate is higher than R's in 20 out
> of 22 
> >> computations, although the estimates always fall within the 95% CI
> for 
> >> the TCC calculated by R.
> >>
> >> Do stata and R calculate TCC in dramatically different ways?  Is
> the 
> >> handling of missing data perhaps different?  Any thoughts?
> >>
> >> Btw, I am sending this question only to the R-help list.
> > 
> > 
> > A bit more information seems necessary:
> > 
> > - tetrachoric correlations depend on 4 numbers, so you should be
> able
> >   to give a direct example
> > 
> > - you're not telling us how you calculate the TCC in R. This is not
> >   obvious (package polycor?).
> > 
> --------------------
> This email message is for the sole use of the intended\ > ...{{dropped}}

More information about the R-help mailing list