[R] Explaining Survival difference between Stata and R

Göran Broström gb at stat.umu.se
Wed May 12 08:54:15 CEST 2004


On Tue, May 11, 2004 at 07:10:36AM -0700, Thomas Lumley wrote:
> On Mon, 10 May 2004, Paul Johnson wrote:
> 
> > Dear Everybody:
> >
> > I'm doing my usual "how does that work in R" thing with some Stata
> > projects.  I find a gross gap between the Stata and R in Cox PH models,
> > and I hope you can give me some pointers about what goes wrong.  I'm
> > getting signals from R/Survival that the model just can't be estimated,
> > but Stata spits out numbers just fine.
> >
> > I wonder if I should specify initial values for coxph?
> 
> It's worth a try. The other question is whether Stata has in fact
> converged -- if the range of rqe is not small then its coefficient may
> actually be infinite.
> 
> > I got a dataset from a student who uses Stata and try to replicate in R.
> > I will share data to you in case you want to see for yourself.  Let me
> > know if you want text or Stata data file.
> 
> I'd like to look at the data.  We should be able to get coxph to converge
> when there is a finite mle -- the log partial likelihood is concave.

Paul and Thomas,

'coxph' or the data (I got it from Paul) indeed has a problem:
------------------------------------------------------------
Call:
coxph(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)


             coef exp(coef) se(coef)     z    p
haz.wst  8.53e-08         1 9.47e-08 0.901 0.37
pol.free       NA        NA 0.00e+00    NA   NA

Likelihood ratio test=0.76  on 1 df, p=0.385  n= 21 
Warning message: 
X matrix deemed to be singular; variable 2 in: 
coxph(Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) 
---------------------------------------------------------------

Is it the data? Let's try 'coxreg' (eha):
---------------------------------------------------------------
Call:
coxreg(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)

Covariate           Mean       Coef        RR       Wald p
haz.wst           2054901     0.000     1.000        0.372 
pol.free            2.090     0.009     1.009        0.958 

Events                    21 
Total time at risk            78 
Max. log. likelihood      -45.001 
LR test statistic         0.76 
Degrees of freedom        2 
Overall p-value           0.684583
----------------------------------------------------------------

This worked just fine (Paul, same results as in Stata?). But, we 
seem to have a scaling problem; lok at the means of the covariates!
Some rescaling gives:
----------------------------------------------------------------
Call:
coxph(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, 
    data = dat)


                      coef exp(coef) se(coef)      z    p
I(haz.wst * 1e-06) 0.08479      1.09    0.095 0.8920 0.37
pol.free           0.00896      1.01    0.170 0.0526 0.96

Likelihood ratio test=0.76  on 2 df, p=0.685  n= 21 
----------------------------------------------------------------
and now 'coxph' gets the same results as 'coxreg'. I don't know about coxph
for sure, but I do know that coxreg centers all covariates before the NR
procedure starts. Maybe we also should rescale to unit variance? And of
course scale back the coefficients and se:s at the end? 

BTW, Paul's data is heavily tied. Could be an idea to use a discrete time
version of the PH model: you can do that with 'mlreg' (eha): 
--------------------------------------------------------------- 
Call:
mlreg(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, 
    data = dat)

Covariate           Mean       Coef        RR     Wald p
I(haz.wst * 1e-06)  2.055     0.097     1.102      0.320 
pol.free            2.090     0.003     1.003      0.985 

Events                    21 
Total time at risk            78 
Max. log. likelihood      -36.324 
LR test statistic         0.93 
Degrees of freedom        2 
Overall p-value           0.627056
----------------------------------------------------------------
Doesn't make much of a difference, though. Efron's method is quite
robust. (You might check if there are any differences with 'breslow')

Best,

Göran

[...]
-- 
 Göran Broström                    tel: +46 90 786 5223
 Department of Statistics          fax: +46 90 786 6614
 Umeå University                   http://www.stat.umu.se/egna/gb/
 SE-90187 Umeå, Sweden             e-mail: gb at stat.umu.se




More information about the R-help mailing list