[R] How to simulate informative censoring in a Cox PH model?

Greg Snow 538280 at gmail.com
Wed Jul 29 18:37:28 CEST 2015


As models become more complex it becomes harder to distinguish
different parts and their effects.  Even for a straight forward linear
regression model if X1 and X2 are correlated with each other then it
becomes difficult to distinguish between the effects of X1^2, X2^2,
and X1*X2.  In your case the informative censoring and model
misspecification are becoming hard to distinguish (and it could be
argued that having informative censoring is really just a form of
model misspecification).  So I don't think so much that you are doing
things wrong, just that you are learning that the models are complex.

Another approach to simulation that you could try is to simulate the
event time and censoring time using copulas (and therefore they can be
correlated to give informative censoring, but without relying on a
term that you could have included in the model) then consider the
event censored if the censoring time is shorter.

On Fri, Jul 24, 2015 at 10:14 AM, Daniel Meddings <dpmeddings at gmail.com> wrote:
> Hi Greg
>
> Many thanks for taking the time to respond to my query. You are right about
> pointing out the distinction between what variables are and are not included
> in the event times process and in the censoring process. I clearly forgot
> this important aspect. I amended my code to do as you advise and now I am
> indeed getting biased estimates when using the informatively censored
> responses. The problem is now that the estimates from the independently
> censored responses are the same - i.e. they are just as biased. Thus the
> bias seems to be due entirely to model mis-specification and not the
> informative censoring.
>
>
> To give a concrete example I simulate event times T_i and censoring times
> C_i from the following models;
>
>
> T_i~ Weibull(lambda_t(x),v_t),    lambda_t(x)=lambda_t*exp( beta_t_0 +
> (beta_t_1*Treat_i) + (beta_t_2*Z_i) + (beta_t_3*Treat_i*Z_i)  )
>
> C_i~ Weibull(lambda_c(x),v_c),    lambda_c(x)=lambda_c*exp( beta_c_0 +
> (beta_c_1*Treat_i) + (beta_c_2*Z_i) + (beta_c_3*Treat_i*Z_i)  )
>
> D_i~Weibull(lambda_d(x),v_D), lambda_d(x)=lamda_d*exp( beta_d_0)
>
> where ;
>
> beta_t_0 = 1,  beta_t_1 = -1,   beta_t_2 = 1,  beta_t_3 = -2,   lambda_t=0.5
>
> beta_c_0 = 0.2,  beta_c_1 = -2,   beta_c_2 = 2,  beta_c_3 = -2,
> lambda_c=0.5
>
> beta_d_0 = -0.7,  lambda_d=0.1
>
> When I fit the cox model to both the informatively censored responses and
> the independent censored responses I include only the Treatment covariate in
> the model.
>
> I simulate Treatment from a Bernoulli distribution with p=0.5 and Z_i from a
> beta distribution so that Z ranges from 0 to 1 (I like to think of Z as a
> "poor" prognosis probability where Z_i=1 means a subject is 100% certain to
> have a poor prognosis and Z_i=0 means zero chance). These parameter choices
> give approximately 27% and 25% censoring for the informatively censored
> responses (using C_i) and the independent censored responses (using D_i)
> respectively. I use N=2000 subjects and 2000 simulation replications.
>
> The above simulation I get estimates of beta_t_2 of -1.526 and -1.537 for
> the informatively censored responses and the independent censored responses
> respectively.
>
> Furthermore when I fit a cox model to the full responses (no censoring at
> all) I get an estimate of beta_t_2 of -1.542. This represents the best that
> can possibly be done given that Z and Treat*Z are not in the model. Clearly
> censoring is not making much of a difference here - model mis-specification
> dominates.
>
> I still must be doing something wrong but I cannot figure this one out.
>
> Thanks
>
> Dan
>
>
>
> On Thu, Jul 23, 2015 at 12:33 AM, Greg Snow <538280 at gmail.com> wrote:
>>
>> I think that the Cox model still works well when the only information
>> in the censoring is conditional on variables in the model.  What you
>> describe could be called non-informative conditional on x.
>>
>> To really see the difference you need informative censoring that
>> depends on something not included in the model.  One option would be
>> to use copulas to generate dependent data and then transform the
>> values using your Weibul.  Or you could generate your event times and
>> censoring times based on x1 and x2, but then only include x1 in the
>> model.
>>
>> On Wed, Jul 22, 2015 at 2:20 AM, Daniel Meddings <dpmeddings at gmail.com>
>> wrote:
>> > I wish to simulate event times where the censoring is informative, and
>> > to
>> > compare parameter estimator quality from a Cox PH model with estimates
>> > obtained from event times generated with non-informative censoring.
>> > However
>> > I am struggling to do this, and I conclude rather than a technical flaw
>> > in
>> > my code I instead do not understand what is meant by informative and
>> > un-informative censoring.
>> >
>> > My approach is to simulate an event time T dependent on a vector of
>> > covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}.
>> > This corresponds to T~ Weibull(lambda(x),v), where the scale parameter
>> > lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is
>> > fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}),
>> > lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I
>> > assume
>> > the regression coefficients are p-dimensional.
>> >
>> > I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C),
>> > lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i)
>> > and
>> > a censored flag delta_inf_i=1 if Y_inf_i <= C_i (an observed event), and
>> > delta_inf_i=0 if Y_inf_i > C_i (informatively censored: event not
>> > observed). I am convinced this is informative censoring because as long
>> > as
>> > beta_T~=0 and beta_C~=0 then for each subject the data generating
>> > process
>> > for T and C both depend on x.
>> >
>> > In contrast I generate non-informative censoring times
>> > D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i)
>> > and a censored flag delta_ninf_i=1 if Y_ninf_i <= D_i (an observed
>> > event),
>> > and delta_ninf_i=0 if Y_ninf_i > D_i (non-informatively censored: event
>> > not
>> > observed). Here beta_D is a scalar. I "scale" the simulation by choosing
>> > the lambda_T, lambda_C and lambda_D parameters such that on average
>> > T_i<C_i
>> > and T_i<D_i to achieve X% of censored subjects for both Y_inf_i and
>> > Y_ninf_i.
>> >
>> > The problem is that even for say 30% censoring (which I think is high),
>> > the
>> > Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when
>> > I
>> > expected the estimates using Y_inf to be biased, and I think I see why:
>> > however different beta_C is from beta_T, a censored subject can
>> > presumably
>> > influence the estimation of beta_T only by affecting the set of subjects
>> > at
>> > risk at any time t, but this does not change the fact that every single
>> > Y_inf_i with delta_inf_i=1 will have been generated using beta_T only.
>> > Thus
>> > I do not see how my simulation can possibly produce biased estimates for
>> > beta_T using Y_inf.
>> >
>> > But then what is informative censoring if not based on this approach?
>> >
>> > Any help would be greatly appreciated.
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> 538280 at gmail.com
>
>



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com



More information about the R-help mailing list