[R] covariate selection in cox model (counting process)

Wed Jul 28 19:42:01 CEST 2004

On Wed, 28 Jul 2004, Mayeul KAUFFMANN wrote:

> > If you can get the conditional independence (martingaleness) then, yes,
> > BIC is fine.
> >
> > One way to check might be to see how similar the standard errors are
> with
> > and without the cluster(id) term.
>
> (Thank you "again !", Thomas.)
>
> At first look, the values seemed very similar (see below, case 2).
> However, to check this without being too subjective, and without a
> specific test, I needed other values to assess the size of the
> differences: what is similar, what is not?
>

I think the econometricians have theory for this (comparing the whole
covariance matrices).

	-thomas

>
> ==========================================================================
> =====
> CASE 1
> I first estimated the model without modeling dependence:
>
> Call:
> coxph(formula = Surv(start, stop, status) ~ cluster(ccode) +
>     pop + pib + pib2 + crois + instab.x1  + instab.autres, data = xstep)
>
>
>                  coef exp(coef) se(coef) robust se     z       p
> pop            0.3606     1.434   0.0978    0.1182  3.05 2.3e-03
> pib           -0.5947     0.552   0.1952    0.1828 -3.25 1.1e-03
> pib2          -0.4104     0.663   0.1452    0.1270 -3.23 1.2e-03
> crois         -0.0592     0.943   0.0245    0.0240 -2.46 1.4e-02
> instab.x1      2.2059     9.079   0.4692    0.4097  5.38 7.3e-08
> instab.autres  0.9550     2.599   0.4700    0.4936  1.93 5.3e-02
>
> Likelihood ratio test=74  on 6 df, p=6.2e-14  n= 7286
>
> There seems to be a strong linear relationship between standard errors
> (se, or naive se) and robust se.
>
> >      summary(lm(sqrt(diag(cox1$var))~ sqrt(diag(cox1$naive.var)) -1))
> Coefficients:
>                            Estimate Std. Error t value Pr(>|t|)
> sqrt(diag(cox1$naive.var))  0.96103    0.04064   23.65 2.52e-06 ***
> Multiple R-Squared: 0.9911, Adjusted R-squared: 0.9894
>
>
> ==========================================================================
> =====
> CASE 2
>
> Then I added a variable (pxcw) measuring the proximity of the previous
> event (1>pxcw>0)
>
> n= 7286
>                  coef exp(coef) se(coef) robust se     z       p
> pxcw           0.9063     2.475   0.4267    0.4349  2.08 3.7e-02
> pop            0.3001     1.350   0.1041    0.1295  2.32 2.0e-02
> pib           -0.5485     0.578   0.2014    0.1799 -3.05 2.3e-03
> pib2          -0.4033     0.668   0.1450    0.1152 -3.50 4.6e-04
> crois         -0.0541     0.947   0.0236    0.0227 -2.38 1.7e-02
> instab.x1      1.9649     7.134   0.4839    0.4753  4.13 3.6e-05
> instab.autres  0.8498     2.339   0.4693    0.4594  1.85 6.4e-02
>
> Likelihood ratio test=78.3  on 7 df, p=3.04e-14  n= 7286
>
>
>                            Estimate Std. Error t value Pr(>|t|)
> sqrt(diag(cox1$naive.var))  0.98397    0.02199   44.74 8.35e-09 ***
> Multiple R-Squared: 0.997, Adjusted R-squared: 0.9965
>
> The naive standard errors (se) seem closer to the robust se than they were
> when not modeling for dependence.
> 0.98397 is very close to one, R^2 grew, etc.
> The dependence is high (risk is multiplied by 2.475 the day after an
> event)
> but conditional independence (given covariates) seems hard to reject.
>
>
> ==========================================================================
> =====
> CASE 3
> Finally, I compared these results with those without repeated events
> (which gives a smaller dataset). A country is removed as soon as we
> observe its first event.
> (robust se is still computed, even if naive se should in fact be used here
> to compute the pvalue)
>
> coxph(formula = Surv(start, stop, status) ~ cluster(ccode) +
>     pop + pib + pib2 + crois + instab.x1  + instab.autres, data =
> xstep[no.previous.event, ])
>
>                  coef exp(coef) se(coef) robust se     z       p
> pop            0.4236     1.528   0.1030    0.1157  3.66 2.5e-04
> pib           -0.7821     0.457   0.2072    0.1931 -4.05 5.1e-05
> pib2          -0.3069     0.736   0.1477    0.1254 -2.45 1.4e-02
> crois         -0.0432     0.958   0.0281    0.0258 -1.67 9.5e-02
> instab.x1      1.9925     7.334   0.5321    0.3578  5.57 2.6e-08
> instab.autres  1.3571     3.885   0.5428    0.5623  2.41 1.6e-02
>
> Likelihood ratio test=66.7  on 6 df, p=1.99e-12  n=5971 (2466 observations
> deleted due to missing)
>
>
> >      summary(lm(sqrt(diag(cox1$var))~ sqrt(diag(cox1$naive.var)) -1))
>                            Estimate Std. Error t value Pr(>|t|)
> sqrt(diag(cox1$naive.var))  0.86682    0.07826   11.08 0.000104 ***
> Residual standard error: 0.06328 on 5 degrees of freedom
> Multiple R-Squared: 0.9608, Adjusted R-squared: 0.953
>
>
> There seems to be no evidence that robust se is more different from se in
> case 2 than in case 3 (and case 1).
> It even seems closer.
>
> I conclude that conditional independence (martingaleness) cannot be
> rejected in CASE 2, when modeling the dependence between events with a
> covariate.
>
> Mayeul KAUFFMANN
> Univ. Pierre Mendes France
> Grenoble - France
>
>
>
> > > Then, there is still another option. In fact, I already modelled
> > > explicitely the influence of past events with a "proximity of last
> event"
> > > covariate, assuming the dependence on the last event decreases at a
> > > constant rate (for instance, the proximity covariate varies from 1 to
> 0.5
> > > in the first 10 years after an event, then from 0.5 to 0.25 in the
> next
> > > ten years, etc).
> > >
> > > With a well chosen modelisation of the dependence effect, the events
> > > become conditionnaly independent, I do not need a +cluster(id) term,
> and I
> > > can use fit$loglik to make a covariate selection based on BIC, right?
> >
> > If you can get the conditional independence (martingaleness) then, yes,
> > BIC is fine.
> >
> > One way to check might be to see how similar the standard errors are
> with
> > and without the cluster(id) term.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle