# [R] Question on Data Simulation

Göran Broström gb at stat.umu.se
Fri Apr 2 15:48:55 CEST 2004

```On Thu, Apr 01, 2004 at 05:11:31PM +0200, Rau, Roland wrote:
> Hello,
>
> > -----Original Message-----
> > From:	"Jingky P. Lozano" [SMTP:jlozano at apoy.upm.edu.ph]
> > Sent:	Thursday, April 01, 2004 2:15 PM
> > To:	r-help at stat.math.ethz.ch
> > Subject:	[R] Question on Data Simulation
> >
> > Dear mailing list,
> >
> > Do you know a specific package in R where I can create an artificial
> > survival
> > data with controlled censoring condition?  (For example, I want to
> > simulate a
> > data set with 5 variables and 20% of the observations are censored.)
> >
> 	maybe I misunderstood something but why don't you create a variable,
> often denoted by 'status' in survival analysis, where the probabilities are
> 0.2 for censoring and 0.8 for the event and add this to your other data?
> 	The code should like like this (for a hypothetical size of 1000
> individuals):
>
> 	status = sample(x=c(0,1), size=1000, replace=TRUE, prob=c(0.2,0.8))

I have an objection to this solution: If you take an uncensored sample
and regard some of the observations as censored, you actually change the
distribution of the censored observations (You are changing a genuine
observation T = t to T > t). Since I happen to be in need of this kind of
simulated data myself, I wrote the following function, which simulates
exponential censored survival times according to the proportional hazards
model (Type I censoring with common censoring time):
----------------------------------------------------------------------
TypeIsurv <- function(covar, beta, cens.prob = 0, scale = 1){
## covar     = n x p matrix of covariates
## beta      = p-vector of regression coefficients
## cens.prob = expected censoring fraction (0 <= cens.prob < 1).

## Output: Censored exponential survival times and censoring
## indicators for Type I censoring (common censoring time 'tc').
## Model: proportional hazards,
## h(t; covar, beta) = exp(covar %*% beta)) / scale

## Here we should have some checks of indata.

if (scale <= 0) stop("'scale' must be positive")
if (cens.prob >= 1) return(NULL)

covar <- as.matrix(covar)
n <- NROW(covar)
score <- exp(covar %*% beta)

tt <- rexp(n, score) # Uncensored survival times
if (cens.prob > 0){
## Find the common censoring time 'tc':
type1 <- function(x) mean(exp(-x * score)) - cens.prob
lower <- 0
upper <- -log(cens.prob) / min(score) + 1
tc <- uniroot(type1, c(lower, upper))\$root
## Calculate event indicator and observed times:
event <- tt <= tc
tt <- pmin(tt, tc)
}else{
event <- rep(TRUE, n)
}
list(time = scale * tt, event = event, tc = scale * tc)
}
--------------------------------------------------------------------
A word of warning: Not well tested; I only noticed that the censoring
fraction seemed to be correct in some experiments.

Göran
--
Göran Broström                    tel: +46 90 786 5223
Department of Statistics          fax: +46 90 786 6614
Umeå University                   http://www.stat.umu.se/egna/gb/
SE-90187 Umeå, Sweden             e-mail: gb at stat.umu.se

```