[R] polychoric correlation: issue with coefficient sign

Dorothee ddurpoix at gmail.com
Wed Jan 14 21:06:22 CET 2009


Dear John and Stas,

Thanks so much for your help.
John, I did the correlation on the complete dataset (no missing values). I
tried what you suggested and you were right: hetcor with pd=FALSE gives me
the same result as polychor.
Anyway, thanks to the answers I got in the forum, I understand I should not
use these two variables in the analysis i wanted to do (at least in their
present state;maybe within an index...). And I should also read (heaps) more
on polychoric correlation/correlation with categorical data...:-)

Thanks for all your help!
Cheers,
Dorothee.



Stas Kolenikov wrote:
> 
> The original Olsson's paper
> (http://www.citeulike.org/user/ctacmo/article/553309) did mention that
> the greatest biases and numeric problems were encountered when the two
> variables had opposite skewness. Your example is even more extreme:
> tetrachoric and polychoric correlations do not like zero counts. It
> actually means that your data sit on a straight line, but that line
> does not pass through the intersection of the thresholds. The nominal
> estimate of the correlation should be 1, and what you see should be
> insignificantly different from 1. No wonder you get LAPACK errors: at
> some point, you had to invert matrix( c(1,1,1,1), 2, 2) or compute its
> determinant in the ML computations. My own Stata implementation of
> polychoric correlation choked on your data and stopped with an
> error... which I should've handled more gracefully :)). The data with
> 0.5 added produced the same correlation estimate but different
> standard errors.
> 
> John Fox offered all other feasible explanations, like handling of
> missing data in the pairwise and full data set computations. But with
> unstable computations you can end just anywhere on the range of
> estimates; the standard errors should tell you that your estimate is
> quite imprecise.
> 
> On 1/12/09, Dorothee <ddurpoix at gmail.com> wrote:
>>
>>  Hello,
>>
>>  I am running polychoric correlations on a dataset composed of 12 ordinal
>> and
>>  binary variables (N =384), using the polycor package.
>>  One of the association (between 2 dichotomous variables) is very high
>> using
>>  the 2-step estimate (0.933 when polychoric run only between the two
>>  variables; but 0.801 when polychoric run on the 12 variables). The same
>>  correlation run with ML estimate returns a singularity message.
>>
>>  First, I would like to know why the estimations between only the two
>>  dichotomous variables and with all the variables at once (with the
>> 2-step
>>  estimate) returns slightly different results.
>>
>>  Secondly, when i checked back the distribution of these two dichotomous
>>  variables they appear about symmetrically opposed. Therefore, one should
>>  indeed expect a strong association between them, but a negative one,
>> isn't
>>  it? Why does the polychoric correlation returns a positive coefficient?
>> What
>>  does it mean for the rest of the coefficients, should i trust them?
>>
>>  I have to say I'm new to R and not very strong in statistics, I hope I
>>  haven't posted a stupid question...
>>
> 
> -- 
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/polychoric-correlation%3A-issue-with-coefficient-sign-tp21425977p21464084.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list