[R] how to run a multinomial logistic regression with fixed effects

Fri Jan 25 17:36:41 CET 2019

Dear list users,
I am looking for a R package implementing a multinomial logistic
regression with fixed effects (Chamberlain 1980, Review of Economic
Studies 47: 225–238).

Over the years, a number of questions have been asked in the R help
and in stack-related websites in order to find how to use this model
in a fixed-effects framework.

In some cases, it was suggested to use existing routines, mainly
nnet::multinom and mlogit::mlogit. However, this doesn’t look like a
viable solution, because these packages are not exactly designed for
this task. Perhaps unsurprisingly, I was not able to find any working
example that can serve my purpose.

Others have suggested to use a Poisson transformation. This is
discussed for example in a working paper on the arxiv
https://arxiv.org/pdf/1707.08538.pdf. However, I found no useful
guides to implement this approach in R.

Finally, many have been addressed to packages using Bayesian
estimation strategies.

I was wondering if anyone in this list can provide an example or any
resource that I can use to begin working on this model using R.
Let me stress that I am not interested in working with Bayesian methods.

My data looks like this:

set.seed(123)
# number of observations
n <- 100
# number of possible choice
possible_choice <- letters[1:4]
# number of years
years <- 3
# individual characteristics
x1 <- runif(n * 3, 5.0, 70.5)
x2 <- sample(1:n^2, n * 3, replace = F)
# actual choice at time 1
actual_choice_year_1 <- possible_choice[sample(1:4, n, replace = T,
prob = rep(1/4, 4))]
actual_choice_year_2 <- possible_choice[sample(1:4, n, replace = T,
prob = c(0.4, 0.3, 0.2, 0.1))]
actual_choice_year_3 <- possible_choice[sample(1:4, n, replace = T,
prob = c(0.2, 0.5, 0.2, 0.1))]
# create long dataset
df <- data.frame(choice = c(actual_choice_year_1,
actual_choice_year_2, actual_choice_year_3),
           x1 = x1, x2 = x2,
           individual_fixed_effect = as.character(rep(1:n, years)),
           time_fixed_effect = as.character(rep(1:years, each = n)),
           stringsAsFactors = F)

Ideally, what I would like to estimate is a formula of the kind

formula("choice ~ x1 + x2 + individual_fixed_effect + time_fixed_effect")

To this purpose, I have tried to use the package mlogit. Consequenly,
following the vignette of this package I have rearranged my data as

library(mlogit)
# create wide dataset
data_mlogit <- mlogit.data(df, id.var = "individual_fixed_effect",
            group.var = "time_fixed_effect",
            choice = "choice",
            shape = "wide")

This allow me to run a multinomial logit regression without fixed
effects by typing

# formula
formula_mlogit <- formula("choice ~ 1| x1 + x2")
# run multinomial regression
fit <- mlogit(formula_mlogit, data_mlogit)
summary(fit)

Apparently, in order to include fixed effects and use a panel
estimation strategy, one should set the argument panel equal to TRUE
in the function mlogit.
However, according to the help of this function, this argument is
evaluated only if another argument, rpar, is not NULL.

The argument rpar is used to set the distribution of random variables
in the model specification. Unfortunately, no random variables are
included in my specification, hence I have no parameters to use in
rpar. As a result, mlogit seems not the best choice in this context.
On stackexchange, a possible solution was proposed few years ago
https://stats.stackexchange.com/questions/51148/unable-to-provide-random-parameter-with-mlogit
However, I don't understand how to actually implement it.

Regards,
Valerio Leone Sciabolazza, Ph.D.
Department of Business and Economics
University of Naples, Parthenope.
valerio.leonesciabolazza using uniparthenope.it
www.valerioleonesciabolazza.com

P.s.
Recently, Stata provided a package (femlogit) for the estimation of
this model with fixed effects.