[R] Using VGAM's vglm function for ordinal logistic regression

Inman, Brant A. M.D. Inman.Brant at mayo.edu
Sat Jan 6 23:56:43 CET 2007


R-Experts:

I am using the vglm function of the VGAM library to perform proportional
odds ordinal logistic regression.  The issue that I would like help with
concerns the format in which the response variable must be provided for
this function to work correctly. Consider the following example:

------

library(VGAM)
library(MASS)

attach(pneumo)
pneumo	# Inspect the format of the original dataset

# Restructure the pneumo dataset into a different format
pneumo2 <- data.frame(matrix(ncol=3, nrow=24))
colnames(pneumo2) <- c('exposure.time', 'severity', 'freq')
pneumo2[,1] <- rep(pneumo[,1],3)	
pneumo2[,2] <-
as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8)))
pneumo2[1:8,3]   <- pneumo[,2]
pneumo2[9:16,3]  <- pneumo[,3]
pneumo2[17:24,3] <- pneumo[,4]
pneumo2	# Inspect the format of the new modified dataset

------

The problem occurs when I try to analyze these two datasets, which are
identical in content, with the proportional odds model using vglm:

------

# Analyze the original dataset with vglm, get one set of results 

> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time),
data=pneumo, 
+  family=cumulative(parallel=T))

Coefficients:
     (Intercept):1      (Intercept):2 log(exposure.time) 
          9.676093          10.581725          -2.596807 

Degrees of Freedom: 16 Total; 13 Residual
Residual Deviance: 5.026826 
Log-likelihood: -204.2742

# Analyzing the modified dataset with vglm gives another set of results

> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2, 
+ family=cumulative(parallel=T))

Coefficients:
     (Intercept):1      (Intercept):2 log(exposure.time) 
        -1.6306499          2.5630170         -0.1937956 

Degrees of Freedom: 48 Total; 45 Residual
Residual Deviance: 503.7251 
Log-likelihood: -251.8626 
Warning messages:
1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 
2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 
3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 
4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 
5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 

# Analyzing the modified dataset with polr reproduces these second
results

> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2)

Coefficients:
log(exposure.time) 
         0.1938154 

Intercepts:
  mild|normal normal|severe 
    -1.630612      2.563055 

Residual Deviance: 503.7251 
AIC: 509.7251

------

Can someone explain what I am doing wrong when using vglm and polr with
the modified dataset?  I do not understand why these two formulations
should give different results. 

Brant Inman



More information about the R-help mailing list