[R] Simulate binary data for a logistic regression Monte Carlo

Wed Apr 8 03:27:12 CEST 2009

Rolf Turner-3 wrote:
> 
> 
> On 8/04/2009, at 1:26 AM, jjh21 wrote:
> 
>>
>> Hello,
>>
>> I am trying to simulate binary outcome data for a logistic  
>> regression Monte
>> Carlo study. I need to eventually be able to manipulate the  
>> structure of the
>> error term to give groups of observations a random effect. Right  
>> now I am
>> just doing a very basic set up to make sure I can recover the  
>> parameters
>> properly. I am running into trouble with the code below. It works  
>> if you
>> take out the object 'epsilon,' but I need that object because it is  
>> where I
>> plan to add in the random effects eventually. Any suggestions? Thanks.
> 
> 	Well, it's the epsilon term that's doing you in.  The model
> 	that you are fitting takes no cognizance of this random effect.
> 	You can *probably* fit a model which does take cognizance of it
> 	using one of the variants of glm() (available in R) which include
> 	random effects, but don't ask me how!
> 
> 	I don't know if it is possible to work out analytically what the
> 	bias should be if the epsilon term is ignored when fitting the
> 	model, but it might be.  Or an approximation might be achieved.
> 
> 	Good luck.
> 
> 		cheers,
> 
> 			Rolf Turner
> 

I agree that that the individual-level random effect is probably the issue.
I played with this some today but didn't manage to resolve it --
tried JAGS/R2jags and glmer from lme4 but didn't manage to
get an estimate of epsilon that matched the input value.  I'm
a little worried about binary data with an underlying random
effect, I think there's an identifiability problem there ...

set.seed(1)

nsim <- 100
N <- 1000

alpha <- alpha.lmer <- numeric(nsim) # Vector to store coefficient estimates
X1 <- X1.lmer <- numeric(nsim)
X2 <- X2.lmer <- numeric(nsim)
eps.lmer <- numeric(nsim)

library(lme4)

a <- 0 # True parameters
B1 <- .5
B2 <- .85

pb <- txtProgressBar()

for (i in 1:nsim){
  setTxtProgressBar(pb,i/nsim)
  x1 <- rnorm(N)
  x2 <- rnorm(N)
  epsilon <- rnorm(N)

  LP <- a + B1*x1 + B2*x2 + epsilon # Linear predictor
  y <- rbinom(length(LP), 1, plogis(LP)) # Binomial link
  indiv <- 1:N

  model <- glm(y ~ x1 + x2, family = binomial)
  model2 <- glmer(y~ x1 +x2 +(1|indiv), family=binomial)
  ##
  alpha[i] <- coef(model)[1]
  X1[i] <- coef(model)[2]
  X2[i] <- coef(model)[3]
  ## 
  alpha.lmer[i] <- fixef(model2)[1]
  X1.lmer[i]    <- fixef(model2)[2]
  X2.lmer[i]    <- fixef(model2)[3]
  eps.lmer[i] <- attr(VarCorr(model2)$indiv,"stddev")
}
close(pb)

mean(X1) # Shows a bias downward (should be about .5)
mean(X2) # Shows a bias downward (should be about .85)
mean(alpha) # Pretty close (should be about 0)

res <- data.frame(X1,X2,alpha,X1.lmer,X2.lmer,alpha.lmer,eps.lmer)
summary(res)
means <- colMeans(res)
ses <- sapply(res,sd)/sqrt(nsim)

library(plotrix)
plotCI(1:7,means[c(1,4,2,5,3,6,7)],ses[c(1,4,2,5,3,6,7)],
       ylim=c(0,1))
points(c(1.5,3.5,5.5,7),c(B1,B2,a,1),pch=16,col=2)

library(R2jags)

j1 <- jags(data=c("N","y","x1","x2"),
           inits=list(list(a=0,B1=0.5,B2=0.85)),
           n.iter=8000,
           n.chain=1,parameters=c("a","B1","B2","tau.eps"),
           model.file="logistmc.txt")

traceplot(j1,ask=FALSE,mfrow=c(2,2))

## logistmc.txt:
model {
  for (i in 1:N) {
    LP[i] <- a + B1*x1[i]+B2*x2[i]+epsilon[i]
    prob[i] <- 1/(1+exp(-LP[i]))
    y[i] ~ dbern(prob[i])
    epsilon[i] ~ dnorm(0,tau.eps)
  }
  zero <- 0
  tau.eps ~ dgamma(0.1,0.1)
  a ~ dnorm(0,0.01)
  B1 ~ dnorm(0,0.01)
  B2 ~ dnorm(0,0.01)
}

-- 
View this message in context: http://www.nabble.com/Simulate-binary-data-for-a-logistic-regression-Monte-Carlo-tp22928872p22941504.html
Sent from the R help mailing list archive at Nabble.com.