[R] val.surv

Frank Harrell f.harrell at vanderbilt.edu
Sun Aug 21 19:01:15 CEST 2011


I welcome a trivial self-contained set of code, not using frailties, that
generates its own data and which fails.  Then I'll debug.
Frank

David Winsemius wrote:
> 
> On Aug 21, 2011, at 7:03 AM, Salvo Mac wrote:
> 
>>  I've   attached a sample of the data sets, this is my full code.
>>
>> #R 2.13
>> #library(survival)
>> #library(Hmisc)
>> #library(splines)
>> library(rms)
>> train<-as.data.frame( train<-read.csv("G:\\train.txt", header=T,  
>> sep="\t"))
>> test<-as.data.frame( test<-read.table("G:\\test.txt", header=T,  
>> sep="\t"))
>> f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train)
>> val.surv(f.1, newdata=test, u=10)
>>
>>
>> #plot(calibrate(f.1, u=30, B=20))
> 
> Salvo;
> 
> I'm not sure where the error is coming from (although I will share my  
> speculation below). I modified your code somewhat. I was under the  
> impression that read.csv had sep="," hardcoded and the read.csv( ...  
> seo="\t") would throw and error. The as.data.fame around either of the  
> read.* statements is completely unnecessary:
> 
> Input code:
> train<-read.table("~/train.txt", header=T, sep="\t")
>   test<-read.table("~/test.txt", header=T, sep="\t")
> 
> I thought the  newdata=test argument should succeed, but it does throw  
> the error that you report:
> 
>   f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train)
>   val.surv(f.1, newdata=test, u=10)
> Error in val.surv(f.1, newdata = test, u = 10) :
>    dims [product 210] do not match the length of object [314]
> In addition: Warning message:
> In est.surv + S[, 1] :
>    longer object length is not a multiple of shorter object length
> 
> I tried sampling from test with replacement to generate a dataframe of  
> equal extent and with that object I do not get the same errors:
> 
>  > extend <- test[sample(1:nrow(test), 314, replace=TRUE), ]
>  >  f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train)
>  > val.surv(f.1, newdata=extend, u=10)
> 
> Validation of Predicted Survival at Time= 10 	n= 314 , events= 64
> 
> hare fit:
> 
> dim A/D   loglik       AIC        penalty
>                                  min    max
>    1 Add   -485.24    976.22  271.33     Inf
>    2 Del   -349.57    710.64    5.67  271.33
>    3 Del   -346.74    710.72    0.96    5.67
>    4 Del   -346.26    715.51    0.01    0.96
>    5 Add   -346.25    721.25    0.00    0.01
> 
> the present optimal number of dimensions is 2.
> penalty(AIC) was 5.75, the default (BIC), would have been 5.75.
> 
>    dim1           dim2           beta        SE         Wald
> Constant                             -10       0.55  -18.14
> Time        43                      0.15      0.015   10.30
> 
> Function used to transform predictions:
> function (p)  log(-log(p))
> 
> Mean absolute error in predicted probabilities: 0.0331
> 0.9 Quantile of absolute errors               : 0.0613
> 
> I have also tried looking at the code and adding   
> options(error=utils::recover) to see if I can identify the point where  
> the length mismatch is being generated, (but I am NOT an ace  
> debugger). I can see that est.surv is created. I can also get the  
> predict(fit, newdata, type = "lp") call to run with train and give  
> sensible numbers. You did not create (or at least say so) a datadist.  
> I tried that when I saw that "lim" was dependent on datadist limits.
> 
> ddT <- datadist(train); options(datadist="ddT")
> 
> Seeing the usehare was set to TRUE; I submitted the code section that  
> was intended for that situation to a browser session and the the first  
> line throws the error that I see at the console:
> 
> Enter a frame number, or 0 to exit
> 
> 1: val.surv(f.1, newdata = test, u = 10)
> 
> Selection: 1
> Called from: top level
> Browse[1]> i <- !is.na(est.surv + S[, 1] + S[, 2])
> Error during wrapup: dims [product 210] do not match the length of  
> object [314]
> 
> As I read this that line is supposed to create an index of valid rows  
> in newdata and the fitted values but is failing in the situation where  
> the nrows of the newdata does not match that of the fit.
> 
> At this point one option might be trying to generate a structure that  
> matches the val.surv output by going through the code and building it  
> up bit by bit:
> 
>   w <- structure(list(harefit = f, p = est.surv, actual = actual,
>              pseq = pseq, actualseq = actualseq, u = u, fun = fun,
>              n = nrow(S), d = sum(S[, 2]), units = units), class =  
> "val.survh")
>          return(w)
> 
> But a much better option would be to report the error to Frank  
> Harrell. I'm copying him since I think your .txt files probably  
> reached the list as well as my mailbox.
> 
> -- 
> David.
> 
>>
>>
>>
>>
>>
>> ________________________________
>> From: David Winsemius <dwinsemius at comcast.net>
>> To: Salvo Mac <salvo_mac at yahoo.com>
>> Cc: "r-help at R-project.org" <r-help at r-project.org>
>> Sent: Sunday, August 21, 2011 5:29 AM
>> Subject: Re: [R] val.surv
>>
>>
>> On Aug 20, 2011, at 10:25 PM, Salvo Mac wrote:
>>
>>> The test and train are like split data sets, contain similar  
>>> variables but from different countries so the two sets are somehow   
>>> independent. And yes it is a data frame.
>>
>> What is a data.frame? test and train may be dataframes, but test[,  
>> "age"] is not a dataframe.
>>
>>> So I extracted age, time and event.
>>
>> Code? The code you offered before would have created a newdata  
>> object (yes, a data.frame) with a single column bearing the same  
>> name as the vector argument ,,,,, "test1". Not named "age". Try it.  
>> do str on such an object:
>>
>> str(dataframe(test1))
>>
>>
>>> So test is data frame,(age, time, event). does that suffice?
>>
>> It certainly does not allow me to reproduce the error you got (which  
>> I still think is probably related to the structure of your argument  
>> to newdata.)
>>
>> That's all I can say without data and code.
>>
>> --David.
>>
>>
>>>
>>>
>>>
>>> ________________________________
>>> From: David Winsemius <dwinsemius at comcast.net>
>>>
>>> Cc: "r-help at R-project.org" <r-help at r-project.org>
>>> Sent: Sunday, August 21, 2011 3:19 AM
>>> Subject: Re: [R] val.surv
>>>
>>>
>>> On Aug 20, 2011, at 8:08 PM, Salvo Mac wrote:
>>>
>>>> Thanks David
>>>>
>>>> However, I tried your trick on val.surv with newdata=test['age']  
>>>> but still didn't work.
>>>> Still gives the same error message:
>>>>
>>>> Error in val.surv(f.1, newdata = test1["age"], u = 10) :
>>>>     dims [product 1797] do not match the length of object [2496]
>>>> In addition: Warning message:
>>>> In est.surv + S[, 1] :
>>>>     longer object length is not a multiple of shorter object length
>>>>
>>>
>>> As I said (and you did
>>> not act  upon):
>>>
>>> The fundamental thing you are doing wrong for q1  is failing to  
>>> unambiguously describe the test object.
>>>
>>> I said it was a guess. Now stop wasting our time and offer what is  
>>> needed.
>>>
>>> --david.
>>>>
>>>> Salvo
>>>> ________________________________
>>>> From: David Winsemius <dwinsemius at comcast.net>
>>>>
>>>> Cc: "r-help at R-project.org" <r-help at r-project.org>
>>>> Sent: Sunday, August 21, 2011 12:55 AM
>>>> Subject: Re: [R] val.surv
>>>>
>>>>
>>>> On Aug 20, 2011, at 3:32 PM, Salvo Mac wrote:
>>>>
>>>>>     Dear R-users,
>>>>>
>>>>> I  have two questions regarding
>>> validation and calibration of Survival regression models.
>>>>>
>>>>> 1.  I am trying to calibrate and validate a cox model using  
>>>>> val.surv.
>>>>> here is my code:
>>>>>     f.1<-cph(Surv(time,event)~age, x=T, y=T, data=train)
>>>>>     test1<-test[,"age"]
>>>>>     val.surv(f.1, newdata=data.frame(test1), u=10)
>>>>>
>>>>>     but I get an error message:
>>>>>
>>>>>     Error in val.surv(f.1, newdata = data.frame(testi), u = 10) :
>>>>>      dims [product 1797] do not match the length of object [2496]
>>>>>     In addition: Warning message:
>>>>> In est.surv + S[, 1] :
>>>>>      longer object length is not a multiple of shorter object  
>>>>> length
>>>>>
>>>>>     I ran the example in the r-documentation but couldn't  
>>>>> extract  dxy from result.
>>>>>
>>>>>     What am I doing wrong?
>>>>
>>>> The fundamental thing you are doing wrong for q1  is failing to  
>>>> unambiguously describe the test object. I would think that if test  
>>>> were a dataframe then wrapping data.frame around a vector might  
>>>> not get it named correctly as 'age'. You might try newdata=  
>>>> test['age']. Just a guess.
>>>>
>>>>>
>>>>>     2.  In validate and calibrate cph functions. If it is frailty  
>>>>> fit, does the the bootstrap resample clusters or just individuals
>>>>
>>>> The code above appears to be dependent on the rms package. The  
>>>> frailty function is part of the underlying survival package and I  
>>>> do not see it mentioned in the index for Harrell's RMS text. You  
>>>> will probably need to wait until Frank comes across this. He is  
>>>> generally very good about correction my errors and knowledge gaps.
>>>>
>>>>>
>>>>
>>
>> David Winsemius, MD
>> West Hartford,  
>> CT<train.txt><test.txt>______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/val-surv-tp3757712p3758717.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list