Achim Zeileis
Achim.Zeileis at uibk.ac.at
Thu Jun 3 12:54:54 CEST 2010
On Wed, 2 Jun 2010, Misha Spisok wrote:
> I can't figure out why using and not using weights in mlogit yields
> identical results. My motivation is for the case when an
> "observation" or "individual" represents a number of individuals. For
> example,
> library(mlogit)
> library(AER)
> data("TravelMode", package = "AER")
> TM <- mlogit.data(TravelMode, choice = "choice", shape = "long",
> alt.levels = c("air", "train", "bus", "car"))
> myweight = rep(floor(1000*runif(nrow(TravelMode)/4)), each = 4)
> summary(mlogit(choice ~ wait + vcost + travel + gcost, data=TM))
> summary(mlogit(choice ~ wait + vcost + travel + gcost, weights=income, data=TM))
> summary(mlogit(choice ~ wait + vcost + travel + gcost,
> weights=myweight, data=TM))
> Each gives the same result.
I can't replicate that. For me all three give different results. For
example, the first two (which do not contain random elements) are
alttrain altbus altcar wait vcost travel
-0.84413818 -1.44150828 -5.20474275 -0.10364955 -0.08493182 -0.01333220
gcost
0.06929537
and
alttrain altbus altcar wait vcost travel
-1.56910793 -1.67020936 -5.44725428 -0.11157800 -0.08866886 -0.01435371
gcost
0.08087749
respectively. I'm using the current "mlogit" version from CRAN: 0.1-7.
> Am I specifying "weights" incorrectly?
Yes, I think so.
> Is there a better way to do what I want to do? That is, if "myweight"
> contains the number of observations represented by an "observation,"
> is this the correct approach?
You will get the correct parameter estimates but not the correct
inference. Following most of the basic model fitting function (such as
lm() or glm()), the weights are _not_ interpreted as case weights. I.e.,
the function treats
length(weights > 0)
as the number of observations and not
sum(weights)
A simple example using lm():
x <- 1:5
y <- c(0, 2, 1, 4, 5)
w <- rep(2, 5)
xx <- c(x, x)
yy <- c(y, y)
Then you can fit both models
fm1 <- lm(y ~ x, weights = w)
fm2 <- lm(yy ~ xx)
and you get the same coefficients
all.equal(coef(fm1), coef(fm2))
(which only mentions that the strings 'xx' and 'x' are different.) But fm1
thinks 2 parameters have been estimated from 5 observations while the
latter thinks 2 parameters have been estimated from 10 observations. Hence
df.residual(fm1) / df.residual(fm2)
vcov(fm2) / vcov(fm1)
Hope that helps,
Z
> If so, what am I doing wrong? If not,
> what suggestions are there?
> Thank you for your time.
> Best,
>
> Misha
