[R] How to simulate informative censoring in a Cox PH model?

Thu Jul 23 01:33:05 CEST 2015

I think that the Cox model still works well when the only information
in the censoring is conditional on variables in the model.  What you
describe could be called non-informative conditional on x.

To really see the difference you need informative censoring that
depends on something not included in the model.  One option would be
to use copulas to generate dependent data and then transform the
values using your Weibul.  Or you could generate your event times and
censoring times based on x1 and x2, but then only include x1 in the
model.

On Wed, Jul 22, 2015 at 2:20 AM, Daniel Meddings <dpmeddings at gmail.com> wrote:
> I wish to simulate event times where the censoring is informative, and to
> compare parameter estimator quality from a Cox PH model with estimates
> obtained from event times generated with non-informative censoring. However
> I am struggling to do this, and I conclude rather than a technical flaw in
> my code I instead do not understand what is meant by informative and
> un-informative censoring.
>
> My approach is to simulate an event time T dependent on a vector of
> covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}.
> This corresponds to T~ Weibull(lambda(x),v), where the scale parameter
> lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is
> fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}),
> lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume
> the regression coefficients are p-dimensional.
>
> I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C),
> lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and
> a censored flag delta_inf_i=1 if Y_inf_i <= C_i (an observed event), and
> delta_inf_i=0 if Y_inf_i > C_i (informatively censored: event not
> observed). I am convinced this is informative censoring because as long as
> beta_T~=0 and beta_C~=0 then for each subject the data generating process
> for T and C both depend on x.
>
> In contrast I generate non-informative censoring times
> D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i)
> and a censored flag delta_ninf_i=1 if Y_ninf_i <= D_i (an observed event),
> and delta_ninf_i=0 if Y_ninf_i > D_i (non-informatively censored: event not
> observed). Here beta_D is a scalar. I "scale" the simulation by choosing
> the lambda_T, lambda_C and lambda_D parameters such that on average T_i<C_i
> and T_i<D_i to achieve X% of censored subjects for both Y_inf_i and
> Y_ninf_i.
>
> The problem is that even for say 30% censoring (which I think is high), the
> Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I
> expected the estimates using Y_inf to be biased, and I think I see why:
> however different beta_C is from beta_T, a censored subject can presumably
> influence the estimation of beta_T only by affecting the set of subjects at
> risk at any time t, but this does not change the fact that every single
> Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus
> I do not see how my simulation can possibly produce biased estimates for
> beta_T using Y_inf.
>
> But then what is informative censoring if not based on this approach?
>
> Any help would be greatly appreciated.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com