[R] Expressing a multinomial GLM as a series of binomial GLMs

Wed Jul 23 22:51:36 CEST 2014

Dear Christoph,

I don't see how what you suggest can work in a mixed-effects model. 

In the case you originally raised, of independent observations, you should be able to recover the coefficients for the multinomial logit model by fitting the logits that I suggested in my earlier email, but notice that these are each for a subset of the observations. As well, even in this case, because the binary logit models are not independent, you can't get the log-likelihood for the multinomial logit model by adding the log-likelihoods for the binary logit models. 

Moreover, in the mixed-effects case, you can't AFAICS subset the observations in this manner.

Best,
 John

On Wed, 23 Jul 2014 21:54:17 +0200
 Christoph Scherber <cscherb1 at gwdg.de> wrote:
> Dear John and R-helpers,
> 
> Thanks for your replies that were both very helpful.
> 
> The reason I was asking is that I´m searching for an easier way to incorporate *random effects* in a multinomial model.
> 
> I was hoping that *combinations of binomial glmmPQL or lmer calls* might be able to do the job - as MCMCglmm would require me to become Bayesian...
> 
> Do you think that combinations of binomial GLMs or glmmPQLs/lmer models would make sense? (example code again below, still without random effects)
> 
> The responses I deal with usually have >50 categories.
> 
> Thanks again and best wishes,
> Christoph
> 
> #Example code again (thanks Charles Berry for pointing me at how to use sapply in this context):
> #set up data: (don´t care what they are, just for playing)
> set.seed(0)
> cats=c("oligolectic","polylectic","specialist","generalist")
> explan1=c("natural","managed")
> 
> multicats=factor(sample(cats,replace=T,100,prob=c(0.5,0.2,0.1,0.5)))
> multiplan1=factor(rep(explan1,50))
> 
> ##
> library(nnet)
> m2=multinom(multicats~multiplan1)
> 
> ggen.preds <-
>      sapply( levels(multicats),
>              function(x) predict(glm(I(multicats==x)~multiplan1,
>                           family=binomial),type="response"))
> 
> max(abs(ggen.preds-predict(m2,type="probs")))
> 
> 
> 
> 
> 
> Am 22.07.2014 22:20, schrieb John Fox:
> > Dear Christoph,
> >
> > If I understand correctly what you've done, the two approaches are not equivalent and should not in general produce the same fitted probabilities.
> >
> > Letting {a, b} represent logit(a vs. b) = log(Pr(a)/Pr(b)) and {ab, cd} represent logit(a or b vs. c or d), and numbering the response levels 1, 2, 3, 4, then the multinomial logit model fits the logits {2, 1}, {3, 1}, {4, 1}, while your binary logit models are for the logits {12, 34}, {23, 14}, {34, 12}. Note that the first and third are complementary, but even if you had used three distinct logits of this kind, the combined models (which BTW wouldn't be independent) would not be equivalent to the multinomial logit model.
> >
> > I hope that this helps (and that I've not misconstrued what you did).
> >
> > Best,
> >   John
> >
> > ------------------------------------------------
> > John Fox, Professor
> > McMaster University
> > Hamilton, Ontario, Canada
> > http://socserv.mcmaster.ca/jfox/
> > 	
> > On Tue, 22 Jul 2014 16:47:17 +0200
> >   "Scherber, Christoph" <cscherb1 at gwdg.de> wrote:
> >> Dear all,
> >>
> >> I am trying to express a multinomial GLM (using nnet) as a series of GLM models.
> >>
> >> However, when I compare the multinom() predictions to those from GLM, I see differences that I can´t
> >> explain. Can anyone help me out here?
> >>
> >> Here comes a reproducible example:
> >>
> >> ##
> >> # set up data: (don´t care what they are, just for playing)
> >> set.seed(0)
> >> cats=c("oligolectic","polylectic","specialist","generalist")
> >> explan1=c("natural","managed")
> >> explan2=c("meadow","meadow","pasture","pasture")
> >> multicats=factor(sample(cats,replace=T,100,prob=c(0.5,0.2,0.1,0.5)))
> >> multiplan1=factor(rep(explan1,50))
> >> multiplan2=factor(rep(explan2,25))
> >>
> >> ########################
> >> library(nnet)
> >> m2=multinom(multicats~multiplan1)
> >>
> >> # predictions from multinomial model
> >> predict(m2,type="probs")
> >>
> >> ########################
> >> # now set up contrasts for response variable "multicats" (which has 4 levels):
> >>
> >> ii=as.numeric(multicats)
> >>
> >> g1=glm(I(ii%in%c(1,2)) ~ multiplan1, family = "binomial")
> >> g2=glm(I(ii%in%c(2,3)) ~ multiplan1, family = "binomial")
> >> g3=glm(I(ii%in%c(3,4)) ~ multiplan1, family = "binomial")
> >>
> >> r1=predict(g1,type="response")
> >> r2=predict(g2,type="response")
> >> r3=predict(g3,type="response")
> >>
> >> # calculate predictions (based on Chapter 8.3 in Dobson 2002, Introduction to GLMs)
> >> ee0=1/(1+r1+r2+r3)
> >> ee1=r1/(1+r1)
> >> ee2=r2/(1+r1+r2)
> >> ee3=r3/(1+r1+r2+r3)
> >>
> >> # compare predictions between GLM and multinom fits:
> >> apply(cbind(ee0,ee1,ee2,ee3),2,mean)
> >> apply(predict(m2,type="probs"),2,mean)
> >>
> >>
> >> #################
> >> [using R 3.1.1 on Windows 7 32-bit]
> >>
> >>
> >>
> >>
> >>
> >> -- 
> >> PD Dr. Christoph Scherber
> >> Senior Lecturer
> >> DNPW, Agroecology
> >> University of Goettingen
> >> Grisebachstrasse 6
> >> 37077 Goettingen
> >> Germany
> >> telephone +49 551 39 8807
> >> facsimile +49 551 39 8806
> >> www.gwdg.de/~cscherb1
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > 	
> > 	
> 

------------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/