[R] Using VGAM's vglm function for ordinal logistic regression

Sun Jan 7 17:27:34 CET 2007

Thank you for the help.  Indeed, the differences in the results that I
noted were due to the incorrect ordering of the response variable that
resulted from my unattentive use of as.ordered on a character vector.

Brant

-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
Sent: Sunday, January 07, 2007 2:52 AM
To: Inman, Brant A. M.D.
Cc: r-help at stat.math.ethz.ch; yee at stat.auckland.ac.nz
Subject: Re: [R] Using VGAM's vglm function for ordinal logistic
regression

On Sat, 6 Jan 2007, Inman, Brant A.   M.D. wrote:

>
> R-Experts:
>
> I am using the vglm function of the VGAM library to perform
proportional
> odds ordinal logistic regression.  The issue that I would like help
with
> concerns the format in which the response variable must be provided
for
> this function to work correctly.

> pneumo2$severity
  [1] normal normal normal normal normal normal normal normal mild
mild
[11] mild   mild   mild   mild   mild   mild   severe severe severe
severe
[21] severe severe severe severe
Levels: mild < normal < severe

is different from the ordering in the first example.

The difference between vglm and polr is that the latter uses the 
conventional sign for the coefficients: see MASS4 p.204.

I would never use as.ordered on a character vector, as this leaves it to
R 
to choose the ordering.  (Even if you think you intended alphabetical 
order, that depends on the locale: see the warnings on the help page.)

> Consider the following example:
>
> ------
>
> library(VGAM)
> library(MASS)
>
> attach(pneumo)
> pneumo	# Inspect the format of the original dataset
>
> # Restructure the pneumo dataset into a different format
> pneumo2 <- data.frame(matrix(ncol=3, nrow=24))
> colnames(pneumo2) <- c('exposure.time', 'severity', 'freq')
> pneumo2[,1] <- rep(pneumo[,1],3)
> pneumo2[,2] <-
> as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8)))
> pneumo2[1:8,3]   <- pneumo[,2]
> pneumo2[9:16,3]  <- pneumo[,3]
> pneumo2[17:24,3] <- pneumo[,4]
> pneumo2	# Inspect the format of the new modified dataset
>
> ------
>
> The problem occurs when I try to analyze these two datasets, which are
> identical in content, with the proportional odds model using vglm:
>
> ------
>
> # Analyze the original dataset with vglm, get one set of results
>
>> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time),
> data=pneumo,
> +  family=cumulative(parallel=T))
>
> Coefficients:
>     (Intercept):1      (Intercept):2 log(exposure.time)
>          9.676093          10.581725          -2.596807
>
> Degrees of Freedom: 16 Total; 13 Residual
> Residual Deviance: 5.026826
> Log-likelihood: -204.2742
>
> # Analyzing the modified dataset with vglm gives another set of
results
>
>> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2,
> + family=cumulative(parallel=T))
>
> Coefficients:
>     (Intercept):1      (Intercept):2 log(exposure.time)
>        -1.6306499          2.5630170         -0.1937956
>
> Degrees of Freedom: 48 Total; 45 Residual
> Residual Deviance: 503.7251
> Log-likelihood: -251.8626
> Warning messages:
> 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
> 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
> 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
> 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
> 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
>
> # Analyzing the modified dataset with polr reproduces these second
> results
>
>> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2)
>
> Coefficients:
> log(exposure.time)
>         0.1938154
>
> Intercepts:
>  mild|normal normal|severe
>    -1.630612      2.563055
>
> Residual Deviance: 503.7251
> AIC: 509.7251
>
> ------
>
> Can someone explain what I am doing wrong when using vglm and polr
with
> the modified dataset?  I do not understand why these two formulations
> should give different results.
>
> Brant Inman
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595