[R] polychor error

Fri Aug 25 22:10:19 CEST 2006

Dear Janet,

Performing a traceback after the error gives a hint:

> tmp.pcc<-polychor(tmp.mat, ML=T, std.err=T)
> traceback()
8: stop("at least one element of ", sQuote("lower"), " is larger than ", 
       sQuote("upper"))
7: checkmvArgs(lower = lower, upper = upper, mean = mean, corr = corr, 
       sigma = sigma)
6: pmvnorm(lower = c(row.cuts[i], col.cuts[j]), upper = c(row.cuts[i + 
       1], col.cuts[j + 1]), corr = R)
5: binBvn(rho, row.cuts, col.cuts)
4: fn(par, ...)
3: function (par) 
   fn(par, ...)(c(0.422816748044139, -2.20343446037221, -2.2163627792244, 
   -1.79075055316993, -1.11077161663679, -0.323037731981826,
0.664036943094355, 
   -2.71305188847272, -1.58338678422633, -0.534011853182102,
0.65365619155084
   ))
2: optim(c(optimise(f, interval = c(-1, 1))$minimum, rc, cc), f, 
       control = control, hessian = std.err)
1: polychor(tmp.mat, ML = T, std.err = T)

The parameters are in the order correlation, row thresholds (of which there
are 6), column thresholds (of which there are 4), and at this point in the
maximization of the likelihood the first row threshold (-2.20343446037221)
is *above* the second threshold (-2.2163627792244), causing pmvnorm() to
complain. This can happen when the problem is ill-conditioned, since optim()
doesn't constrain the order of the thresholds.

So, why is the problem ill-conditioned? Here is your contingency table:

> tmp.mat
  3 4  5  6  7
1 0 2  0  0  0
2 0 0  1  1  0
3 0 0  5  1  1
4 0 5 10 12  2
5 0 5 27 29 11
6 1 3 20 57 31
7 0 1  9 34 32

There are only two observations in the first row, two in the second row, and
one in the first column.  You're expecting a lot out of ML to get estimates
of the first couple of thresholds for rows and the first for columns.

One approach would be to eliminate one or more sparse rows or columns; e.g.,

> polychor(tmp.mat[-1,], ML = TRUE, std.err = TRUE)

Polychoric Correlation, ML est. = 0.3932 (0.05719)
Test of bivariate normality: Chisquare = 14.21, df = 19, p = 0.7712

  Row Thresholds
  Threshold Std.Err.
1   -2.4410  0.24270
2   -1.8730  0.14280
3   -1.1440  0.09236
4   -0.3353  0.07408
5    0.6636  0.07857

  Column Thresholds
  Threshold Std.Err.
1   -2.6500  0.30910
2   -1.6420  0.12100
3   -0.5494  0.07672
4    0.6508  0.07833
> polychor(tmp.mat[,-1], ML = TRUE, std.err = TRUE)

Polychoric Correlation, ML est. = 0.4364 (0.05504)
Test of bivariate normality: Chisquare = 14.85, df = 17, p = 0.6062

  Row Thresholds
  Threshold Std.Err.
1   -2.4940  0.26020
2   -2.2080  0.19160
3   -1.7850  0.13400
4   -1.1090  0.09113
5   -0.3154  0.07371
6    0.6625  0.07821

  Column Thresholds
  Threshold Std.Err.
1   -1.6160  0.11970
2   -0.5341  0.07639
3    0.6507  0.07800
> 

A more defensible alternative would be to collapse sparse rows or columns.

BTW, you *can* get estimated standard-errors from polychor() for your
original table for the two-step estimator:

> polychor(tmp.mat, ML = FALSE, std.err = TRUE)

Polychoric Correlation, 2-step est. = 0.4228 (0.05298)
Test of bivariate normality: Chisquare = 19.22, df = 23, p = 0.6883

That's because the two-step estimator estimates the thresholds from the
marginal distributions of the variables rather than from their joint
distribution.

So, is this a bug or a feature? I suppose that it's a bug to allow the
thresholds to get out of order, though to constrain the optimization to
prevent that from happening is probably not worth the effort and could cause
some strange results. On the other hand, the error tells you something about
the data, so maybe it's a feature.

I noticed that you posted another version of this question two days before
this one. I apologize for the slow response -- I wasn't able to read my
email for a few days and it's taken me most of today to catch up.

Regards,
 John

---- original message -------

Janet Rosenbaum jrosenba at rand.org
Thu Aug 24 00:41:16 CEST 2006

Hi.

Does anyone know whether the following error is a result of a bug or a 
feature?

I can eliminate the error by making ML=F, but I would like to see the 
values of the cut-points and their variance.  Is there anything that I 
can do?

tmp.vec<-c(0,  0,  0 , 0  ,0 , 1,  0,  2,  0 , 0,  5  ,5  ,3  ,1,  0 , 
1,  5, 10, 27, 20,  9,  0,  1,  1, 12, 29, 57, 34,  0,  0,  1,  2, 11, 
31, 32)
tmp.mat<-matrix(tmp.vec, nrow=7)
rownames(tmp.mat)<-1:7
colnames(tmp.mat)<-3:7
tmp.pcc<-polychor(tmp.mat, ML=T, std.err=T)
Error in checkmvArgs(lower = lower, upper = upper, mean = mean, corr = 
corr,  :
     at least one element of 'lower' is larger than 'upper'

Thanks,

Janet

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox