[R] How to test frequency independence (in a 2 by 2 table) withmany missing values

Fri Jul 24 21:17:07 CEST 2009

Hi Tal, you can use the lme4 library and use a random effects model. I will
walk you through the example below (though, generally you should ask a
statistician at your school about this). At the very end, I will include a
loop for Monte-Carlo simulations that shows that the estimation of the fixed
effects is unbiased, which assures you that you are estimating the "correct"
coefficient. In addition, as Robert says, you should remove all subjects for
which both observations are missing. 

##First, we simulate some data. 
##Note how the data structure matches the data structure you have:

id=rep(1:100,each=2)
#the subject ID 

test=rep(0:1,100)
# first or second measurement/test (your variable of interest)

randeff=rep(rnorm(100,0,1),each=2) 
# a random effect for each subject

linear.predictor=randeff+2*test
#the linear predictor
#note the coefficient on the fixed effect of "test" is 2

probablty=exp(linear.predictor)/(1+exp(linear.predictor))
#compute the probability using the linear predictor

y=rbinom(200,1,probablty)
#draw 0/1 dependent variables
#with probability according to the variable probablty 

miss=sample(1:200,20,replace=FALSE)
#some indices for missing variables

y[miss]=NA
#replace the dependent variable with NAs for the "miss"ing indices

##Some additional data preparation

id=factor(id) 
#make sure id is coded as a factor/dummy

mydata=data.frame(y,test,id)
#bind the data you need for estimation in a dataframe

##Run the analysis

library(lme4)
reg=lmer(y~test+(1|id),binomial,data=mydata)
summary(reg)

##And here are the Monte-Carlo simulations

library(lme4)

bootstraps=200
estim=matrix(0,nrow=bootstraps,ncol=2)

for(i in 1:bootstraps){
id=rep(1:100,each=2)
test=rep(0:1,100)
randeff=rep(rnorm(100,0,1),each=2)
linear.predictor=randeff+2*test
probablty=exp(linear.predictor)/(1+exp(linear.predictor))
y=rbinom(200,1,probablty)
miss=sample(1:200,20,replace=FALSE)
y[miss]=NA
id=factor(id)
mydata=data.frame(y,test,id)
reg=lmer(y~test+(1|id),binomial,data=mydata)
estim[i,]=reg at fixef
}

mean(estim[,1]) 
#mean estimate of the intercept from 200 MC-simulations
#should be close to zero (no intercept in the linear.predictor)

mean(estim[,2])
#mean estimate of the 'test' fixed effect from 200 MC-simulations
#should be close to 2 (as in the linear.predictor)

#plot estimated coefficients from 200 MC simulations
par(mfcol=c(1,2))
hist(estim[,1],main="Intercept")
hist(estim[,2],main="Effect of variable 'test'")

HTH,
Daniel

-------------------------
cuncta stricte discussurus
-------------------------

-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Tal Galili
Gesendet: Friday, July 24, 2009 1:23 PM
An: r-help at r-project.org
Betreff: [R] How to test frequency independence (in a 2 by 2 table) withmany
missing values

Hello dear R help group.

My question is statistical and not R specific, yet I hope some of you might
be willing to help.

*Experiment settings*:  We have a list of subjects. each of them went
through two tests with the answer to each can be either 0 or 1.
*Goal*: We want to know if the two experiments yielded different results in
the subjects answers.
*Statistical test (and why it won't work)*: Naturally we would turn to
performing a mcnemar test. But here is the twist: we have missing values in
our data.
For our purpose, let's assume the missingnes is completely at random, and we
also have no data to use for imputation. Also, we have much more missing
data for experiment number 2 then in experiment number 1.

So the question is, under these settings, how do we test for experiment
effect on the outcome?

So far I have thought of two options:
1) To perform the test only for subjects that has both values. But they are
too scarce and will yield low power.
2) To treat the data as independent and do a pearson's chi square test
(well, an exact fisher test that is) on all the non-missing data that we
have. The problem with this is that our data is not fully independent (which
is a prerequisite to chi test, if I understood it correctly). So I am not
sure if that is a valid procedure or not.

Any insights will be warmly welcomed.

p.s: here is an example code producing this scenario.

set.seed(102)

x1 <- rbinom(100, 1, .5)
x2 <- rbinom(100, 1, .3)

X <- data.frame(x1,x2)
tX <- table(X)
margin.table(tX,1)
margin.table(tX,2)
mcnemar.test(tX)

put.missings <- function(x.vector, na.percent) { turn.na <-
rbinom(length(x.vector), 1, na.percent)  x.vector[turn.na == 1] <- NA
return(x.vector)
}

x1.m <- put.missings(x1, .3)
x2.m <- put.missings(x2, .6)

tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m)))
fisher.test(tX.m)

With regards,
Tal

--
----------------------------------------------

My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.