[R] coxph and ordinal variables?

David Winsemius dwinsemius at comcast.net
Thu Sep 9 01:02:46 CEST 2010


On Sep 8, 2010, at 6:43 PM, Min-Han Tan wrote:

> Dear R-help members,
>
> Apologies - I am posting on behalf of a colleague, who is a little  
> puzzled
> as STATA and R seem to be yielding different survival estimates for  
> the same
> dataset when treating a variable as ordinal. Ordered() is used  to  
> represent
> an ordinal variable) I understand that R's coxph (by default) uses  
> the Efron
> approximation, whereas STATA uses (by default) the Breslow. but we did
> compare using the same approximations. I am wondering if this is a  
> result of
> how coxph manages an ordered factor?
>
> Essentially, this is a survival dataset using tumor grade (1, 2, 3  
> and 4) as
> the risk factor. This is more of an 'ordinal' variable, rather than a
> continuous variable. For the same data set of 399 patients, when  
> treating
> the vector of tumor grade as a continuous variable (range of 1 to 4),
> testing the Efron and the Breslow approximations yield the same  
> result in
> both R and STATA.
>
> However, when Hist_Grade_4 grp is converted into an ordered factor  
> using
> ordered(), and the same scripts are applied, rather different  
> results are
> obtained, relative to the STATA output. This is tested across the  
> different
> approximations, with consistent results. The comparison using Efron
> approximation and ordinal data is is below.

Are you sure you want an ordered factor? In R this means you will be  
creating linear, quadratic and cubic contrasts. Notice the L, Q and C  
designations on the coefficients. That certainly does not look to be  
comparable to what you are getting from Stata. My suggestion would be  
to create an un-ordered factor in R and see whether you get results  
more in line with Stata's output when applied to your data.

-- 
David.
>
> Your advice is very much appreciated!
>
> Min-Han
>
> Apologies below for the slightly malaligned output.
>
> STATA output
>
> . xi:stcox i.Hist_Grade_4grp, efr
> i.Hist_Grade_~p   _IHist_Grad_1-4     (naturally coded; _IHist_Grad_1
> omitted)
>
>        failure _d:  FFR_censor
>  analysis time _t:  FFR_month
>
> Iteration 0:   log likelihood =  -1133.369
> Iteration 1:   log likelihood = -1129.4686
> Iteration 2:   log likelihood = -1129.3196
> Iteration 3:   log likelihood = -1129.3191
> Refining estimates:
> Iteration 0:   log likelihood = -1129.3191
>
> Cox regression -- Efron method for ties
>
> No. of subjects =          399                     Number of obs   =
> 399
> No. of failures =          218
> Time at risk    =  9004.484606
>                                                  LR chi2(3)      =
> 8.10
> Log likelihood  =   -1129.3191                     Prob > chi2     =
> 0.0440
>
> ------------------------------------------------------------------------------
>         _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf.
> Interval]
> ------------- 
> +----------------------------------------------------------------
> _IHist_Gra~2 |   1.408166   .3166876     1.52   0.128     .9062001
> 2.188183
> _IHist_Gra~3 |    1.69506   .3886792     2.30   0.021     1.081443
> 2.656847
> _IHist_Gra~4 |   2.540278   .9997843     2.37   0.018      1.17455
> 5.49403
>
>
>
> R Output using
>> summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
> method=c("breslow")))
>> summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
> method=c("exact")))
>> summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
> method=c("efron")))
>
>
>
> n=399 (21 observations deleted due to missingness)
>
>                    coef exp(coef) se(coef)     z Pr(>|z|)
> Hist_Grade_4grp.L 0.66685   1.94809  0.26644 2.503   0.0123 *
> Hist_Grade_4grp.Q 0.03113   1.03162  0.20842 0.149   0.8813
> Hist_Grade_4grp.C 0.08407   1.08771  0.13233 0.635   0.5252
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>                 exp(coef) exp(-coef) lower .95 upper .95
> Hist_Grade_4grp.L     1.948     0.5133    1.1556     3.284
> Hist_Grade_4grp.Q     1.032     0.9693    0.6857     1.552
> Hist_Grade_4grp.C     1.088     0.9194    0.8392     1.410
>
> Rsquare= 0.02   (max possible= 0.997 )
> Likelihood ratio test= 8.1  on 3 df,   p=0.044
> Wald test            = 8.02  on 3 df,   p=0.0455
> Score (logrank) test = 8.2  on 3 df,   p=0.04202
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list