[R] survival survfit with newdata

David Winsemius dwinsemius at comcast.net
Thu May 17 14:15:37 CEST 2012


On May 17, 2012, at 2:20 AM, Damjan Krstajic wrote:

>
> Thanks David for prompt reply. I agree with you. However, I still  
> fail to get the survfit function to work with newdata. In my  
> previous example I changed the column names of testX matrix and I  
> still fail.
>
>> colnames(testX)<-names(coxph.model$coefficients)
>> sfit<- survfit(coxph.model,newdata=data.frame(testX))
> Error in model.frame.default(formula = Surv(trainTime, trainStatus)  
> ~  :
>   variable lengths differ (found for 'trainX')

I don't get that error when I run this. I do get better results using  
a data argument to the coxph call. You should be getting predicted  
survival curves for 10 cases that will be estimated at the same time  
points as were available in the input data in the original data.

  coxph.model<-coxph(Surv(trainTime,trainStatus)~ . ,  
data=data.frame(trainX))
  colnames(testX)<-names(coxph.model$coefficients)
  sfit<- survfit(coxph.model,newdata=data.frame(testX))
  plot(sfit)  # 10 curves

I do not see matrix input to coxph as a described data input, so  
perhaps you should follow the help page more closely?

-- 
David.

>
> What would be solution in my simple example to get the survival  
> curves for testX? Thanks in advance. DK
>
>> CC: r-help at r-project.org
>> From: dwinsemius at comcast.net
>> To: dkrstajic at hotmail.com
>> Subject: Re: [R] survival survfit with newdata
>> Date: Thu, 17 May 2012 00:52:55 -0400
>>
>>
>> On May 16, 2012, at 5:08 PM, Damjan Krstajic wrote:
>>
>>>
>>> Dear all,
>>>
>>> I am confused with the behaviour of survfit with newdata option.
>>
>> Yes. It has the same behavior as any other newdata/predict from
>> regression. You need to supply a dataframe with the same names as in
>> the original formula. Doesn't look as though that strategy is being
>> followed. The name of the column needs to be 'trainX' since that was
>> what was its name on the RHS of hte formula,  and you may want to
>> specify times. If you fail to follow those rules, the function falls
>> back on offering estimates from the original data.
>>
>>>
>>> I am using the latest version R-2-15-0. In the simple example below
>>> I am building a coxph model on 90 patients and trying to predict 10
>>> patients. Unfortunately the survival curve at the end is for 90
>>> patients.
>>
>> As is proper with a malformed newdata argument.
>>
>>> Could somebody please from the survival package confirm that this
>>> behaviour is as expected or not - because I cannot find a way of
>>> using 'newdata' with really new data. Thanks in advance. DK
>>>
>>>> x<-matrix(rnorm(100*20),100,20)
>>>
>>>>
>>> time<-runif(100,min=0,max=7)
>>>
>>>>
>>> status<-sample(c(0,1), 100, replace = TRUE)
>>>> trainX<-x[11:100,]
>>>>
>>> trainTime<-time[11:100]
>>>>
>>> trainStatus<-status[11:100]
>>>>
>>> testX<-x[1:10,]
>>>> coxph.model<-
>>> coxph(Surv(trainTime,trainStatus)~ trainX)
>>>> sfit<- survfit(coxph.model,newdata=data.frame(testX))
>>>
>>>>
>>> dim(sfit$surv)
>>>
>>> [1] 90 90
>>>
>>>
>>> 		 	   		
>>> 	[[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
> 		 	   		

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list