[R] Using mlogit with case weights
Hylton, Ronald
ronald.hylton at citi.com
Thu Aug 28 22:29:36 CEST 2014
I have a set of data with ~ 250,000 observations summarized in ~ 1000 rows that I'm trying to analyze with mlogit. Based on the discussion in
https://stat.ethz.ch/pipermail/r-help/2010-June/241161.html
I understand that using weights= does not (fully) do what I need. I tried expanding my data to one row per observation to sidestep this issue but after waiting several hours for mlogit to finish I decided this was not a feasible strategy and I needed to use weights= and make whatever adjustments are necessary for the inferences.
My solution is the following:
Define W = sum(weights) / length(weights)
Multiply the Log-Likelihood by W
Divide the Std. Error's by sqrt(W) (and therefore multiply the t-value's by sqrt(W))
Can anyone confirm that this is correct (at least as a large-N approximation)?
The code below provides a test case where I compare duplicating rows to using weights and adjusting the inferences (the original code was from Kenneth Train's exercises using the mlogit package for R). The last few lines printed (Ratios: ...) show that the coefficients in the two cases are the same to a high accuracy and the Log-Likelihood, Std. Error's and t-value's also have the expected ratios to a decent accuracy. However it would be good to know that this approach is conceptually sound.
Thanks,
Ron
library("mlogit")
data("Heating", package = "mlogit")
H <- mlogit.data(Heating, shape="wide", choice="depvar", varying=c(3:12))
m <- mlogit(depvar~ic+oc|0, H)
# print(summary(m))
w <- sample(1:200, nrow(Heating), replace=TRUE) # random weights
i <- rep(1:nrow(Heating), times=w) # index vector for duplicating rows according to the weights
H2 <- mlogit.data(Heating[i,], shape="wide", choice="depvar", varying=c(3:12))
m2 <- mlogit(depvar~ic+oc|0, H2)
# print(summary(m2))
m3 <- mlogit(depvar~ic+oc|0, H, weights=rep(w,each=5))
# print(summary(m3))
print(all.equal(coef(m2),coef(m3)))
f2 <- fitted(m2)[cumsum(w)]
f3 <- fitted(m3)
names(f2) <- names(f3)
print(all.equal(f2,f3))
cat("\nRatios:", m2$logLik/m3$logLik, sum(w)/length(w), sqrt(sum(w)/length(w)), sqrt(length(w)/sum(w)), "\n\n")
s2 <- summary(m2)
s3 <- summary(m3)
print(s2$CoefTable / s3$CoefTable)
[[alternative HTML version deleted]]
More information about the R-help
mailing list