[R] survfit & number of variables != number of variable names

David Winsemius dwinsemius at comcast.net
Tue Nov 20 03:33:03 CET 2012


On Nov 19, 2012, at 5:33 PM, Georges Dupret wrote:

> Hi David,
> 
> Sorry for the signature files... this is automatic. I should disable that.
> 
> Please find in attachment a copy of small.csv.gz

I found it but I suspect nobody else will. I think Terry Therneau already got a copy. when you attached it earlier. But the rest of Rhelp did not, since .gz files will get scrubbed by the list-serv.


> Best,
> 
> ge
> 
> On 11/19/2012 02:37 PM, David Winsemius wrote:
>> 
>> On Nov 19, 2012, at 2:23 PM, David Winsemius wrote:
>> 
>>> 
>>> On Nov 19, 2012, at 11:07 AM, Georges Dupret wrote:
>>> 
>>>> Hi!
>>>> 
>>>> In answer to:
>>>> 
>>>> --------
>>>> I noticed that you were using what might be called an "externally  
>>>> created Surv object". I have a memory that Terry Therneau has  
>>>> criticized that practice. I cannot remember if it was in exactly this  
>>>> situation but I might ask if setting up the model as:
>>>> 
>>>> cox = coxph(Surv(stime, event) ~ bucket*(today + accor + both) +  
>>>> activity, data = data)
>>>> 
>>>> ... might give the survival machinery a better handle on where  
>>>> everything might be found. 
>>>> ------------
>>>> 
>>>> I tried to create the Surv object "internally" but I face the same issue:
>>>> 
>>>>> (cox.s = coxph(Surv(time=absence, event=(censored==FALSE)) ~ 
>>>>> bucket*(today) + strata(activity), data = small))
>>>> Call:
>>>> coxph(formula = Surv(time = absence, event = (censored == FALSE)) ~ 
>>>>  bucket * (today) + strata(activity), data = small)

All of your 'censored' were FALSE so all of your events were TRUE. My guess is that you are having problems because you end up with different model designs in the different strata:

> with( small, table(activity, today))
                today
activity         FALSE TRUE
  (100,121]          1   13
  (121,149]          2    8
  (149,196]          0    4
  (196,1.33e+03]     1    8
  (30,42]            1    8
  (42,55]            4   12
  (55,68]            2    9
  (68,83]            2    9
  (83,100]           2    6
  [11,30]            0    8


I do not think it matters that you levels for the factor variable will not be in the expected order:

table(small$activity)

     (100,121]      (121,149]      (149,196] (196,1.33e+03]        (30,42]        (42,55]        (55,68]        (68,83] 
            14             10              4              9              9             16             11             11 
      (83,100]        [11,30] 
             8              8 


But I do also wonder if the small numbers in each strata might be causing problems. Is it really needed to stratify so finely?

-- 
David.

>>>> 
>>>>                     coef exp(coef) se(coef)      z    p
>>>> bucket575            0.4526     1.572    0.740  0.612 0.54
>>>> todayTRUE           -0.0886     0.915    0.676 -0.131 0.90
>>>> bucket575:todayTRUE -0.1670     0.846    0.794 -0.210 0.83
>>>> 
>>>> Likelihood ratio test=2.32  on 3 df, p=0.509  n= 100, number of events= 100 
>>>>> fit = survfit(cox.s, newdata=small[1:50,])
>>>> Error in model.frame.default(data = small[1:50, ], formula = ~bucket +  : 
>>>> number of variables != number of variable names
>>> 
>>> OK. Thanks for doing that. You might want to know that the only attachment that made it through to the emailing list was a file named small.csv.gz.sig  That's not a format that my system knows how to decompress ( I tried downloading GnuPG and compiling it but 
>>> 
>> 
>> (hit sent button too soon. )   .... was unable to figure out how to decompress with GnuPG either. (It's hard to imagine this needed to be encrypted.)
>> 
> <small.csv.gz>

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list