[R] Predicted Cox survival curves - factor coding problems..

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon May 7 15:45:59 CEST 2007

On Mon, 7 May 2007, Terry Therneau wrote:

>  The combination of survfit, coxph, and factors is getting confused.  It is
> not smart enough to match a new data frame that contains a numeric for sitenew
> to a fit that contained that variable as a factor.  (Perhaps it should be smart
> enough to at least die gracefully -- but it's not).

The 'standard' model-fitting functions in R do make an attempt to match 
the new data to that used for fitting, or die gracefully.  Perhaps Thomas 
could look into adding this to survift and coxph (see 

>   The simple solution is to not use factors.
> site1 <- 1*(coxsnps$sitenew==1)
> site2 <- 1*(coxsnps$sitenew==2)
> test1 <- coxph(Surv(time, censor) ~ snp1 + sex + site1 + site2 + gene +
> 	  eth.self + strata(edu), data= coxsnps)
> 	 output
> profile1 <- data.frame(snp1=c(0,1), site2=c(0,0), sex=c(0,0),
> 	               site1=c(0,0), site2=c(0,0), geno=c(0,0) eth.self=c(0,0))
> plot(survfit(test1, newdata=profile1))
> Note that you do not have to explicitly make "edu" a factor.  Since it is
> included in a strata statement, the coxph routine must treat it as discrete
> groups.
> 	Terry Therneau

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list