[R] Tetrachoric correlation in R vs. stata

Fri Jun 23 21:17:00 CEST 2006

Dear Janet,

Are you using the polychor() function in the polycor package to compute
tetrachoric correlations? If so, two methods are provided: A relatively
quick method (the default) and ML. The methods implemented are
described in the references given in ?polycor.

Missing data simply are eliminated from the contingency table from
which a tetrachoric correlation is computed. If, however, you're using
hetcor() to compute a matrix of tetrachoric correlations, then missing
data are handled according to the use argument, which defaults to
"complete.obs" and is described in ?hetcor.

If you want to know whether polychor() or Stata is right, then one
thing that you might do is try them on data for which you know the
answer. If you do this, you should of course make sure that both are
trying to compute the same thing (e.g., the ML estimate).

I hope this helps,
 John

On Fri, 23 Jun 2006 10:42:12 -0700
 Janet Rosenbaum <jrosenba at rand.org> wrote:
> 
> I hope someone here knows the answer to this since it will save me
> from 
> delving deep into documentation.
> 
> Based on 22 pairs of vectors, I have noticed that tetrachoric 
> correlation coefficients in stata are almost uniformly higher than
> those 
> in R, sometimes dramatically so (TCC=.61 in stata, .51 in R;  .51 in 
> stata, .39 in R).  Stata's estimate is higher than R's in 20 out of
> 22 
> computations, although the estimates always fall within the 95% CI
> for 
> the TCC calculated by R.
> 
> Do stata and R calculate TCC in dramatically different ways?  Is the 
> handling of missing data perhaps different?  Any thoughts?
> 
> Btw, I am sending this question only to the R-help list.
> 
> Thanks,
> 
> Janet
> 
> 
> --------------------
> 
> This email message is for the sole use of the intended\ > ...{{dropped}}