[R] Tetrachoric correlation in R vs. stata

John Fox jfox at mcmaster.ca
Sat Jun 24 09:09:47 CEST 2006


Dear Peter,

Thanks for fielding the followup to this question -- I didn't see it
until Saturday morning.

Janet says that polychor() gives virtually the same ML and 2-step
estimates, but (thinking about it) I believe that these should be
identical within rounding error in the 2 x 2 case, and that appears to
be true.

Janet's example shows that the print method for polycor objects tries
to compute a test of bivariate normality even for tetrachoric
correlations where the df for the test are 0, producing a warning. I've
fixed that and uploaded a new version of the package to CRAN.

Regards,
 John

On 24 Jun 2006 00:30:03 +0200
 Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> Janet Rosenbaum <jrosenba at rand.org> writes:
> 
> > Peter --- Thanks for pointing out the omitted information.  The
> > hazards of attempting to be brief.
> > 
> > In R, I am using polychor(vec1, vec2, std.err=T) and have used both
> > the ML and 2 step estimates, which give virtually identical
> answers.
> > I am explicitly using only the 632 complete cases in R to make sure
> > missing data is handled the same way as in stata.
> > 
> > Here's my data:
> > 
> > 522	54
> > 34	22
> > 
> > > polychor(v1, v2, std.err=T, ML=T)
> > 
> > Polychoric Correlation, ML est. = 0.5172 (0.08048)
> > Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN
> > 
> >     Row Thresholds
> >     Threshold Std.Err.
> >   1     1.349  0.07042
> > 
> > 
> >     Column Thresholds
> >     Threshold Std.Err.
> >   1     1.174  0.06458
> >   Warning message:
> >   NaNs produced in: pchisq(q, df, lower.tail, log.p)
> > 
> > In stata, I get:
> > 
> > . tetrachoric t1_v19a ct1_ix17
> > 
> > Tetrachoric correlations (N=632)
> > 
> > ----------------------------------
> >      Variable |  t1_v19a  ct1_ix17
> > -------------+--------------------
> >       t1_v19a |        1
> >      ct1_ix17 |    .6169         1
> > ----------------------------------
> 
> Well, 
> 
> > pmvnorm(c(1.349,1.174),c(Inf,Inf),
> +    sigma=matrix(c(1,.5172,.5172,1),2))*632
> [1] 22.00511
> attr(,"error")
> [1] 1e-15
> attr(,"msg")
> [1] "Normal Completion"
> > pnorm(1.349)*632
> [1] 575.9615
> > pnorm(1.174)*632
> [1] 556.0352
> 
> so the estimates from R appear to be consistent with the table. In
> contrast, plugging in the .6169 from Stata gives
> 
> 
> > pmvnorm(c(1.349,1.174),c(Inf,Inf),
> +     sigma=matrix(c(1,.6169,.6169,1),2))*632
> [1] 26.34487
> ...
> 
> You might want to follow up on
> 
> http://www.ats.ucla.edu/stat/stata/faq/tetrac.htm
> 
> 
> > Thanks for your help.
> > 
> > Janet
> > 
> > 
> > 
> > Peter Dalgaard wrote:
> > > Janet Rosenbaum <jrosenba at rand.org> writes:
> > >
> > >> I hope someone here knows the answer to this since it will save
> me
> > >> from delving deep into documentation.
> > >>
> > >> Based on 22 pairs of vectors, I have noticed that tetrachoric
> > >> correlation coefficients in stata are almost uniformly higher
> than
> > >> those in R, sometimes dramatically so (TCC=.61 in stata, .51 in
> R;
> > >> .51 in stata, .39 in R).  Stata's estimate is higher than R's in
> 20
> > >> out of 22 computations, although the estimates always fall
> within
> > >> the 95% CI for the TCC calculated by R.
> > >>
> > >> Do stata and R calculate TCC in dramatically different ways?  Is
> > >> the handling of missing data perhaps different?  Any thoughts?
> > >>
> > >> Btw, I am sending this question only to the R-help list.
> > > A bit more information seems necessary:
> > > - tetrachoric correlations depend on 4 numbers, so you should be
> able
> > >   to give a direct example
> > > - you're not telling us how you calculate the TCC in R. This is
> not
> > >   obvious (package polycor?).
> > >
> > 
> > 
> > --------------------
> > 
> > This email message is for the sole use of the intended
> rec...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/



More information about the R-help mailing list