[R] Coxph with factors

Thomas Lumley tlumley at u.washington.edu
Sat Jul 16 17:07:28 CEST 2005

On Sat, 16 Jul 2005, Kylie-Anne Richards wrote:

> Thank you for your help.
> ____________________________________________________________
>> In any case, to specify f.pom You need it to be a factor with the same set 
>> of levels.  You don't say what the lowest level of pom is, but if it is, 
>> say, -3.
>> f.pom=factor(-3, levels=seq(-3,2.5, by=0.5))
> ____________________________________________________________
> For this particular model, f.pom starts at -5.5 going to 2 in 0.5 increments. 
> I seem to have misunderstood your explanation, as R is still producing an 
> error.

In the model you showed, there were no factor levels below -2.5.  You need 
to make sure that the levels are the same in the initial data and the data 
supplied to survfit.  Check this with levels().

> ____________________________________________________________
>> I would first note that the survival function at zero covariates is not a 
>> very useful thing and is usually numerically unstable, and it's often more 
>> useful to get the survival function at some reasonable set of covariates.
> ____________________________________________________________
> Please correct me if I'm wrong, I was under the impression that the survival 
> function at zero covariates gave the baseline distribution. I.e. if given the 
> baseline prob.,S_0, at time t, one could calculate the survival prob for 
> specified covariates by 
> S_0^exp(beta(vo)*specified(vo)+beta(po)*specified(po)+beta(f.pom at the level 
> of interest)) for time t.
> Since I was unable to get survfit to work with specified covariates, I was 
> using the survival probs of the 'avg' covariates, S(t), to determine the 
> baseline at time t, i.e. 
> S(t)^(1/exp(beta(vo)*mean(vo)+beta(po)*mean(po)+beta(f.pom-5.5)*mean(f.pom-5.5)+beta(f.pom-5.0)*mean(f.pom-5.0)+........). 
> And then proceeding as mention in the above paragraph (clearly not an 
> efficient way of doing things).

Yes, but you don't need to go via the baseline.  The survival curves for 
any two covariate vectors z1 and z2 are related by

S(t; z1)= S(t; z2)^(z1-z2)

For convenience of mathematical notation, mathematical statisticians write 
everything in terms of z2=0, and call this "the baseline". In the real 
world, though, you are better off with a baseline defined at a covariate 
value somewhere in the vicinity of the actual data. If, as if often the 
case, the zero covariate value is a long way from the observed data, both 
the computation of the survival curve at zero and the transformation to 
the covariates you want are numerically ill-conditioned.

So, you can use the "baseline" returned by survfit(z2), which is at 
z2=fit$means, to do anything you can do with the baseline at z=0, and the 
computations will be more accurate.


More information about the R-help mailing list