[R] Simulating from the Weibull with right censoring

Mon Mar 24 06:33:59 CET 2014

Hi Dennis,
Thanks for your feedback.
I do apologise, I omitted the lines of code that created the censoring variable, I was focused more on altering the actual
outcome times. So it is actually more like this,

if (outcomeType == 'Weibull'){
 Y[i] <- rweibull(1, shape, shape))               
}
if (Y[i] >  censorT){
Y[i] <- censorT
death[i] <- 0}
else
{Y[i]  <- Y[i]
death[i] <- 1
}
	}

Thankyou for the tip about the random seed.
Lucy

-----Original Message-----
From: Dennis Murphy [mailto:djmuser at gmail.com] 
Sent: Monday, 24 March 2014 4:12 PM
To: Lucy Leigh
Subject: Re: [R] Simulating from the Weibull with right censoring

Hi:

I don't think you're going about this the right way. It appears to me that your (unnecessary) loop is performing truncation instead, and there is a sharp distinction in the survival literature between censoring and truncation.

First of all, let's introduce you to the concept of vectorized calculations in R. To reproduce your code (with a random seed because you didn't set one), you could do something like this:

set.seed(5489)    # set a random seed for reproducibility of results
nSubjects <- 1000

# change these to the values you're using shape <- 1 scale <- 2

# Simulate nSubjects values from a Weibull(shape, scale) dist'n Y <- rweibull(nSubjects, shape, scale)

# Plot a histogram of the generated sample with the Weibull pdf hist(Y, probability = TRUE, main = "Weibull(1, 2) random deviates") v <- seq(min(Y), max(Y), length = 100) lines(v, dweibull(v, 1, 2), col = "blue", lwd = 2)

# How many values of Y are > 5?
sum(Y > 5)       # should be 71

# Truncate the values larger than 5 to 5:
Y1 <- Y                 # Clone Y first
Y1[Y1 > 5] <- 5      # replace values > 5 with 5
sum(Y1 > 5)          # should be 0

# Censor values of Y greater than 5:
censor <- 1 * (Y > 5)        # 1 = Yes, 0 = No
table(Y <= 5, censor)       # cross-tabulation

With censoring, you create a companion vector to Y which contains indicators of whether each value of Y is censored or not according to the censoring criterion. The survival package, for example, uses the
Surv() function to pair a quantitative variable with a corresponding vector of censoring indicators, as in

Surv(Y, censor)

There are a number of good books on survival analysis floating around.
One whose datasets are wrapped up in an R package is by Klein and Moeschberger, who discuss in some detail the distinction between censoring and truncation. You can have left censoring (e.g. in environmetrics where limits of detection are relevant) or right censoring (more common in medical and engineering contexts); similarly for left and right truncation. There are several types of right censoring as well as interval censoring; the code above for defining the censoring vector corresponds to simple type I right censoring with a single, fixed censoring time.

Notice the absence of loops. This is because the vast majority of the functions in base R are vectorized, which means you can operate on the vector as a single entity rather than looping through each element.
This is one of the primary differences between R and conventional programming languages; another is the variety of ways in which to index data objects in R (e.g., look at the way the subset of Y1 was chosen whose values were replaced by 5). A reading of the relevant sections in An Introduction to R, which comes with the software, would appear to be worth your time.

HTH,
Dennis

On Sun, Mar 23, 2014 at 9:14 PM, Lucy Leigh <lucy.leigh at newcastle.edu.au> wrote:
> Hi everyone,
>
> I am currently attempting to simulate some survival data, based on a 
> Weibull model. I basically need to simulate some survival data so that I can then test out a program I'm writing, before applying it to some real data.
>
> I've managed to create failure time data, using the rweibull function. I.e.
> Y[i] <- rweibull(1, shape, scale)
> For a variety of shape and scale parameters etc.
>
> I wish to add in right censoring. If I am considering my failure times 
> as deaths, and the censoring time as a study cut-off date (for 
> example, 5 years from baseline), would it be correct to do something 
> like,
>
> for(i in 1:nSubjects){
> if (Y[i] >  5){
> Y[i] <- 5}
> else
> {Y[i]  <- Y[i]
> }}
>
> I guess my question is, is it statistically sound to impose the right censoring after estimating the distribution with no censoring?
> I am leaning towards yes, because in an ideal world where we followed 
> all participants until they died, the distribution would be as 
> estimated by the rweibull above....and assuming that the right 
> censoring is independent, then cutting of the measurements at a pre-specified time wouldn't affect the outcome at all. But I guess I am just looking for a second opinion?
>
> Thanks in advance, this list has been immensely helpful with some 
> previous questions I have had about the survival package.
> Lucy
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.