[R] Question on Data Simulation

Göran Broström gb at stat.umu.se
Fri Apr 2 15:48:55 CEST 2004


On Thu, Apr 01, 2004 at 05:11:31PM +0200, Rau, Roland wrote:
> Hello,
> 
> > -----Original Message-----
> > From:	"Jingky P. Lozano" [SMTP:jlozano at apoy.upm.edu.ph]
> > Sent:	Thursday, April 01, 2004 2:15 PM
> > To:	r-help at stat.math.ethz.ch
> > Subject:	[R] Question on Data Simulation 
> > 
> > Dear mailing list,
> > 
> > Do you know a specific package in R where I can create an artificial
> > survival 
> > data with controlled censoring condition?  (For example, I want to
> > simulate a 
> > data set with 5 variables and 20% of the observations are censored.)
> > 
> 	maybe I misunderstood something but why don't you create a variable,
> often denoted by 'status' in survival analysis, where the probabilities are
> 0.2 for censoring and 0.8 for the event and add this to your other data?
> 	The code should like like this (for a hypothetical size of 1000
> individuals):
> 
> 	status = sample(x=c(0,1), size=1000, replace=TRUE, prob=c(0.2,0.8))

I have an objection to this solution: If you take an uncensored sample 
and regard some of the observations as censored, you actually change the
distribution of the censored observations (You are changing a genuine
observation T = t to T > t). Since I happen to be in need of this kind of
simulated data myself, I wrote the following function, which simulates
exponential censored survival times according to the proportional hazards
model (Type I censoring with common censoring time):
----------------------------------------------------------------------
TypeIsurv <- function(covar, beta, cens.prob = 0, scale = 1){
    ## covar     = n x p matrix of covariates
    ## beta      = p-vector of regression coefficients
    ## cens.prob = expected censoring fraction (0 <= cens.prob < 1).

    ## Output: Censored exponential survival times and censoring
    ## indicators for Type I censoring (common censoring time 'tc'). 
    ## Model: proportional hazards,
    ## h(t; covar, beta) = exp(covar %*% beta)) / scale
    
    ## Here we should have some checks of indata.

    if (scale <= 0) stop("'scale' must be positive")
    if (cens.prob >= 1) return(NULL)
    
    covar <- as.matrix(covar)
    n <- NROW(covar)
    score <- exp(covar %*% beta)

    tt <- rexp(n, score) # Uncensored survival times
    if (cens.prob > 0){
        ## Find the common censoring time 'tc':
        type1 <- function(x) mean(exp(-x * score)) - cens.prob
        lower <- 0
        upper <- -log(cens.prob) / min(score) + 1
        tc <- uniroot(type1, c(lower, upper))$root
        ## Calculate event indicator and observed times:
        event <- tt <= tc
        tt <- pmin(tt, tc)
    }else{
        event <- rep(TRUE, n)
    }
    list(time = scale * tt, event = event, tc = scale * tc)
}
--------------------------------------------------------------------
A word of warning: Not well tested; I only noticed that the censoring
fraction seemed to be correct in some experiments.

Göran
-- 
 Göran Broström                    tel: +46 90 786 5223
 Department of Statistics          fax: +46 90 786 6614
 Umeå University                   http://www.stat.umu.se/egna/gb/
 SE-90187 Umeå, Sweden             e-mail: gb at stat.umu.se




More information about the R-help mailing list