[R] difference between MASS::polr() and Design::lrm()

Mon Jun 30 16:13:00 CEST 2008

Hi Vito
(question to the authors of MASS below)

2008/6/30 vito muggeo <vmuggeo at dssm.unipa.it>:
> Dear Haubo,
> many thanks for your reply.
> Yes you are right, by scaling the "score", I get the same results.
>
> However it sounds strange to me. I understand that the SE and/or t-ratio of
> the intercepts depend on the scale of the explanatory variable, but I do not
> understand  why the t-ratio of the slope also depends on the scale.. In
> classical regression models (LM, GLM, ...) the scale of the covariate does
> not affect the t ratio! As far as I know, classical references (e.g.
> Agresti, 2002) do not discuss a such point. Could yu suggest me additional
> reference?  Namely why should I center the explanatory variables?

The wrong standard errors is a computational issue. Centering is a
statistical and (sometimes) a computational issue.

The t-value is just the ratio of the estimate to its standard error,
and when the program does not give you the correct standard error,
naturally the t-value is also wrong. It is often a good idea to center
covariates for inferential purposes and a very good reference is
Venables' "Exegeses on Linear Models":
http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf.

I find it natural to expect polr to give the correct standard error
without centering the covariate. I do not know why it does not, but
perhaps is has something to do with the scaling of the parameters
parsed to the BFGS-optimizer in optim. Optimization of the intercepts
seems to be performed on something like

summary(fm1 <- polr(factor(grade)~score))
cumsum(c(fm1$zeta[1], exp(fm1$zeta[-1])))

which seems to distort the whole variance-covariance matrix of the
parameters and therefore provides wrong standard errors for the
score-parameter as well. Optimization is much better scaled if you do

summary(fm2 <- polr(factor(grade)~1 + I(score - median(score))))
cumsum(c(fm2$zeta[1], exp(fm2$zeta[-1])))

This is merely a guess, and perhaps the authors of MASS, could provide
some clarification?

One other thing, that puzzles me, is the fact that the parameter
estimate/standard error ratio is reported as a t-value rather than a
z-value. Likelihood theory (as I know it) seems to suggest a z-value
rather than the t-value. A t-value is also provided by the special
case; the ordinary logistic regression model:

summary(glm(grade > 3 ~ score, family = binomial))

as well as by lrm in package Design.

>
> Of course profile-Lik based CI may be very usefuls at this aim..but this is
> another story..

I agree, but the topic is closely related to standard errors ;-)

Best
Rune

>
> many thanks again,
> vito
>
> Rune Haubo ha scritto:
>>
>> Dear Vito
>>
>> No, you are not wrong, but you should center score prior to model
>> estimation:
>>
>> summary(fm1 <- polr(factor(grade)~I(score - mean(score))))
>>
>> which gives the same standard errors as do lrm. Now the intercepts
>> refer the median score rather than some potential unrealistic score of
>> 0.
>>
>> You can always check if standard errors are appropriate by plotting
>> the profile likelihood and the normal approximation (based on the
>> standard errors):
>>
>> x <- seq(-.043 - .04, -.043 + .04, len = 20)
>> dev <- double(20)
>> for(i in 1:20)
>> dev[i] <- polr(factor(grade) ~ 1 + offset(x[i] * (score -
>> median(score))))$deviance
>>
>> yy <- spline(x, dev, n = 100, method = "natural")
>> plot(yy[[1]], exp(-(yy[[2]] - min(yy[[2]]))/2), type = "l",
>>        xlab = "Score", ylab = "Normalized profile likelihood")
>> ## Add Normal approximation:
>> lines(yy[[1]], exp(-(yy[[1]] - coef(fm1))^2 /
>>                 (2 * vcov(fm1)[1, 1])), lty = 2,
>>      col = "red")
>>
>> ## Add confidence limits:
>> level <- c(0.95, 0.99)
>> lim <- sapply(level, function(x)
>>              exp(-qchisq(x, df=1)/2) )
>> abline(h = lim, col = "grey")
>>
>> ## Add confidence interval points:
>> points(confint(fm1), rep(lim[1], 2), pch = 3)
>>
>> So these standard errors are the most appropriate you can get, but
>> because the profile likelihood is not symmetric, you could consider
>> reporting the asymmetric profile likelihood confidence interval
>> instead.
>>
>> Hope this helps
>> Rune
>>
>> 2008/6/30 vito muggeo <vmuggeo at dssm.unipa.it>:
>>>
>>> Dear all,
>>> It appears that MASS::polr() and Design::lrm() return the same point
>>> estimates but different st.errs when fitting proportional odds models,
>>>
>>> grade<-c(4,4,2,4,3,2,3,1,3,3,2,2,3,3,2,4,2,4,5,2,1,4,1,2,5,3,4,2,2,1)
>>>
>>> score<-c(525,533,545,582,581,576,572,609,559,543,576,525,574,582,574,471,595,
>>>  557,557,584,599,517,649,584,463,591,488,563,553,549)
>>>
>>> library(MASS)
>>> library(Design)
>>>
>>> summary(polr(factor(grade)~score))
>>> lrm(factor(grade)~score)
>>>
>>> It seems to me that results from lrm() are right..
>>>
>>> Am I wrong?
>>> Thanks,
>>> vito
>>>
>>> --
>>> ====================================
>>> Vito M.R. Muggeo
>>> Dip.to Sc Statist e Matem `Vianelli'
>>> Università di Palermo
>>> viale delle Scienze, edificio 13
>>> 90128 Palermo - ITALY
>>> tel: 091 6626240
>>> fax: 091 485726/485612
>>> http://dssm.unipa.it/vmuggeo
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
> --
> ====================================
> Vito M.R. Muggeo
> Dip.to Sc Statist e Matem `Vianelli'
> Università di Palermo
> viale delle Scienze, edificio 13
> 90128 Palermo - ITALY
> tel: 091 6626240
> fax: 091 485726/485612
> http://dssm.unipa.it/vmuggeo
> ====================================
>

-- 
Rune Haubo Bojesen Christensen

Master Student, M.Sc.
Phone: (+45) 30 26 45 54

Informatics and Mathematical Modelling
Section for Statistics
Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark