[R] Simulating MATCHED Case Control Data

R. W. code.monkey91 at yahoo.com
Sat Dec 15 05:05:50 CET 2007


Dear R-Help-List,

A few days ago I asked for help simulating
case-control data.  I got a great answer to help me
with my code, but I am having trouble modifying it for
1:M matched case-control data.  Does anyone have any
guidance/pointers for simulating 1:M matched data?.

Thank you,
-R




> Dear R-Help-List,
>
> I was wondering if anyone had experience simulating
> case-control data in R?

I think the only simple method that allows you to
specify any arbitrary
 
population distribution of predictors and does not
rely on the logistic
 
regression model being true is to simulate cohorts and
then take a 
case-control sample from each one

Eg for a case-control sample of 500 cases and 1000
controls where there
 is 
about a 1% cumulative incidence
1. Generate all your predictor variables for a cohort
of 50,000 people,
 
from any distributions you want
2. Specify the disease model. This could be logistic
     logit(p(Y=1))=eta = b0+b1x1+b2x2+...
     p = exp(eta)/(1+exp(eta))
   or it could be anything else.
3. Now sum(p) gives the expected number of cases.
Adjust b0 so that
 this 
is a bit bigger than your desired number, eg 550.
4. Generate Y for the population by rbinom(50000,1,p)
5. Choose 500 cases and 1000 controls using sample().




      ____________________________________________________________________________________
Looking for last minute shopping deals?



More information about the R-help mailing list