[R] Trouble Computing Type III SS in a Cox Regression

Thu Apr 25 16:56:16 CEST 2013

Please take this discussion offlist. It is **not** about R.

-- Bert

On Thu, Apr 25, 2013 at 5:59 AM, Paul Miller <pjmiller_57 at yahoo.com> wrote:
> Hi Dr. Therneau,
>
> Thanks for your reply to my question. I'm aware that many on the list do not like type III SS. I'm not particularly attached to the idea of using them but often produce output for others who see value in type III SS.
>
> You mention the problems with type III SS when testing interactions. I don't think we'll be doing that here though. So my type III SS could just as easily be called type II SS I think. If the SS I'm calculating are essentially type II SS, is that still problematic for a Cox model?
>
> People using type III SS generally want a measure of whether or not a variable is contributing something to their model or if it could just as easily be discarded. Is there a better way of addressing this question than by using type III (or perhaps type II) SS?
>
> A series of model comparisons using a LRT might be the answer. If it is, is there an efficient way of implementing this approach when there are many predictors? Another approach might be to run models through step or stepAIC in order to determine which predictors are useful and to discard the rest. Is that likely to be any good?
>
> Thanks,
>
> Paul
>
> --- On Wed, 4/24/13, Terry Therneau <therneau at mayo.edu> wrote:
>
>> From: Terry Therneau <therneau at mayo.edu>
>> Subject: Re:  Trouble Computing Type III SS in a Cox Regression
>> To: r-help at r-project.org, "Paul Miller" <pjmiller_57 at yahoo.com>
>> Received: Wednesday, April 24, 2013, 5:55 PM
>> I should hope that there is trouble,
>> since "type III" is an undefined concept for a Cox
>> model.  Since SAS Inc fostered the cult of type III
>> they have recently added it as an option for phreg, but I am
>> not able to find any hints in the phreg documentation of
>> what exactly they are doing when you invoke it.  If you
>> can unearth this information, then I will be happy to tell
>> you whether
>>    a. using the test (whatever it is) makes
>> any sense at all for your data set
>>    b. if "a" is true, how to get it out of R
>>
>> I use the word "cult" on purpose -- an entire generation of
>> users who believe in the efficacy of this incantation
>> without having any idea what it actually does.  In many
>> particular instances the SAS type III corresponds to a
>> survey sampling question, i.e., reweight the data so that it
>> is balanced wrt factor A and then test factor B in the new
>> sample.  The three biggest problems with type III are
>> that
>> 1: the particular test has been hyped as "better" when in
>> fact it sometimes is sensible and sometimes not, 2: SAS
>> implemented it as a computational algorithm which
>> unfortunately often works even when the underlying rationale
>> does not hold and
>> 3: they explain it using a notation that completely obscures
>> the actual question.  This last leads to the nonsense
>> phrase "test for main effects in the presence of
>> interactions".
>>
>> There is a "survey reweighted" approach for Cox models, very
>> closely related to the work on causal inference ("marginal
>> structural models"), but I'd bet dollars to donuts that this
>> is not what SAS is doing.
>>
>> (Per 2 -- type III was a particular order of operations of
>> the sweep algorithm for linear models, and for backwards
>> compatability that remains the core definition even as
>> computational algorthims have left sweep behind.  But
>> Cox models can't be computed using the sweep algorithm).
>>
>> Terry Therneau
>>
>>
>> On 04/24/2013 12:41 PM, r-help-request at r-project.org
>> wrote:
>> > Hello All,
>> >
>> > Am having some trouble computing Type III SS in a Cox
>> Regression using either drop1 or Anova from the car package.
>> Am hoping that people will take a look to see if they can
>> tell what's going on.
>> >
>> > Here is my R code:
>> >
>> > cox3grp<- subset(survData,
>> > Treatment %in% c("DC", "DA", "DO"),
>> > c("PTNO", "Treatment", "PFS_CENSORED", "PFS_MONTHS",
>> "AGE", "PS2"))
>> > cox3grp<- droplevels(cox3grp)
>> > str(cox3grp)
>> >
>> > coxCV<- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~
>> AGE + PS2, data=cox3grp, method = "efron")
>> > coxCV
>> >
>> > drop1(coxCV, test="Chisq")
>> >
>> > require(car)
>> > Anova(coxCV, type="III")
>> >
>> > And here are my results:
>> >
>> > cox3grp<- subset(survData,
>> > +
>>      Treatment %in% c("DC", "DA",
>> "DO"),
>> > +
>>      c("PTNO", "Treatment",
>> "PFS_CENSORED", "PFS_MONTHS", "AGE", "PS2"))
>> >> >  cox3grp<- droplevels(cox3grp)
>> >> >  str(cox3grp)
>> > 'data.frame':    227 obs. of  6
>> variables:
>> >   $ PTNO        :
>> int  1195997 104625 106646 1277507 220506 525343 789119
>> 817160 824224 82632 ...
>> >   $ Treatment   : Factor
>> w/ 3 levels "DC","DA","DO": 1 1 1 1 1 1 1 1 1 1 ...
>> >   $ PFS_CENSORED: int  1 1 1 0 1 1
>> 1 1 0 1 ...
>> >   $ PFS_MONTHS  : num  1.12
>> 8.16 6.08 1.35 9.54 ...
>> >   $ AGE
>>    : num  72 71 80 65 72 60 63 61 71 70
>> ...
>> >   $ PS2
>>    : Ord.factor w/ 2 levels "Yes"<"No": 2
>> 2 2 2 2 2 2 2 2 2 ...
>> >> >  >  coxCV<-
>> coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2,
>> data=cox3grp, method = "efron")
>> >> >  coxCV
>> > Call:
>> > coxph(formula = Surv(PFS_MONTHS, PFS_CENSORED == 1) ~
>> AGE + PS2,
>> >      data = cox3grp, method = "efron")
>> >
>> >
>> >            coef exp(coef)
>> se(coef)      z     p
>> > AGE    0.00492
>>    1.005  0.00789  0.624 0.530
>> > PS2.L -0.34523     0.708
>> 0.14315 -2.412 0.016
>> >
>> > Likelihood ratio test=5.66  on 2 df,
>> p=0.0591  n= 227, number of events= 198
>> >> >  >  drop1(coxCV, test="Chisq")
>> > Single term deletions
>> >
>> > Model:
>> > Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2
>> >         Df
>> AIC    LRT Pr(>Chi)
>> > <none>     1755.2
>> > AGE     1 1753.6 0.3915
>> 0.53151
>> > PS2     1 1758.4 5.2364
>> 0.02212 *
>> > ---
>> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
>> '.' 0.1 ' ' 1
>> >> >  >  require(car)
>> >> >  Anova(coxCV,  type="III")
>> > Analysis of Deviance Table (Type III tests)
>> >      LR Chisq Df Pr(>Chisq)
>> > AGE   0.3915  1
>> 0.53151
>> > PS2   5.2364  1
>> 0.02212 *
>> > ---
>> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
>> '.' 0.1 ' ' 1
>> >> >
>> > Both drop1 and Anova give me a different p-value than I
>> get from coxph for both my two-level ps2 variable and for
>> age. This is not what I would expect based on experience
>> using SAS to conduct similar analyses. Indeed SAS
>> consistently produces the same p-values. Namely the ones I
>> get from coxph.
>> >
>> > My sense is that I'm probably misusing R in some way
>> but I'm not sure what I'm likely to be doing wrong. SAS
>> prodcues Wald Chi-Square results for its type III tests.
>> Maybe that has something to do with it. Ideally, I'd like to
>> get type III values that match those from coxph. If anyone
>> could help me understand better, that would be greatly
>> appreciated.
>> >
>> > Thanks,
>> >
>> > Paul
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm