[R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

Tue Nov 25 17:21:38 CET 2014

Dear list,

I have used the ‘polr’ function in the MASS package to run an ordinal logistic regression for an ordinal categorical response variable with 15 continuous explanatory variables.
I have used the code (shown below) to check that my model meets the proportional odds assumption following advice provided at (http://www.ats.ucla.edu/stat/r/dae/ologit.htm) – which has been extremely helpful, thank you to the authors! However, I’m a little worried about the output implying that not only are the coefficients across various cutpoints similar, but they are exactly the same (see graphic below).

Here is the code I used (and see attached for the output graphic)

FGV1b<-data.frame(FG1_val_cat=factor(FGV1b[,"FG1_val_cat"]),scale(FGV1[,c("X","Y","Slope","Ele","Aspect","Prox_to_for_FG","Prox_to_for_mL","Prox_to_nat_border","Prox_to_village","Prox_to_roads","Prox_to_rivers","Prox_to_waterFG","Prox_to_watermL","Prox_to_core","Prox_to_NR","PCA1","PCA2","PCA3")]))

b<-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE)

#Checking the assumption. So the following code will estimate the values to be graphed. First it shows us #the logit transformations of the probabilities of being greater than or equal to each value of the target #variable

FGV1b$FG1_val_cat<-as.numeric(FGV1b$FG1_val_cat) 

sf <- function(y) {

  c('VC>=1' = qlogis(mean(FGV1b$FG1_val_cat >= 1)),

    'VC>=2' = qlogis(mean(FGV1b$FG1_val_cat >= 2)),

    'VC>=3' = qlogis(mean(FGV1b$FG1_val_cat >= 3)),

    'VC>=4' = qlogis(mean(FGV1b$FG1_val_cat >= 4)),

    'VC>=5' = qlogis(mean(FGV1b$FG1_val_cat >= 5)),

    'VC>=6' = qlogis(mean(FGV1b$FG1_val_cat >= 6)),

    'VC>=7' = qlogis(mean(FGV1b$FG1_val_cat >= 7)),

    'VC>=8' = qlogis(mean(FGV1b$FG1_val_cat >= 8)))

}

  (t <- with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf)))

#The table displays the (linear) predicted values we would get if we regressed our

#dependent variable on our predictor variables one at a time, without the parallel slopes

#assumption. So now, we can run a series of binary logistic regressions with varying cutpoints

#on the dependent variable to check the equality of coefficients across cutpoints

par(mfrow=c(1,1))

plot(t, which=1:8, pch=1:8, xlab='logit', main=' ', xlim=range(s[,7:8]))

Apologies that I am no statistics expert and perhaps I am missing something obvious here. However, I have spent a long time trying to figure out if there is a problem in how I tested the model assumption and also trying to figure out other ways to run the same kind of model.

 For example, I read in many help mailing lists that others use the vglm function (in the VGAM package) and the lrm function (in the rms package) (for example see here:  http://stats.stackexchange.com/questions/25988/proportional-odds-assumption-in-ordinal-logistic-regression-in-r-with-the-packag). I have tried to run the same models but am continuously coming up against warnings and errors.

 For example, when I try to fit the vglm model with the ‘parallel=FALSE’ argument (as the previous link mentions is important for testing the proportional odds assumption), I encounter the following error:

Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'

In addition: Warning message:

In Deviance.categorical.data.vgam(mu = mu, y = y, w = w, residuals = residuals,  :

  fitted values close to 0 or 1

And after many searches for help, I can’t seem to find a way to fix this problem.

I would like to ask please if there is anyone who might understand and be able to explain to me why the graph I produced above looks as it does. If indeed it means that something isn’t right, could you please help me find a way to test the proportional odds assumption when just using the polr function. Or if that is just not possible, then I will resort to trying to use the vglm function, but would then need some help to explain why I keep getting the error given above. 

I hope this is clear. Please do let me know if I should provide some more information that would help address this query.

NOTE: As a background, there are 1000 datapoints here, which are actually location points across a study area. I am looking to see if there are any relationships between the categorical response variable and these 15 explanatory variables. All of those 15 explanatory variables are spatial characteristics (for example, elevation, x-y coordinates, proximity to forest etc.). The 1000 datapoints were randomly allocated using a GIS, but I took a stratified sampling approach. I made sure that 125 points were randomly chosen within each of the 8 different categorical response levels. I hope this information is also helpful.

I am extremely grateful to anyone who could please give me some guidance with this. 

Thank you very much for your time,

Charlie