[R] Presenting Hazard ratios for interacting variables in a Cox model

David Winsemius dwinsemius at comcast.net
Mon Nov 21 20:59:45 CET 2016

> On Nov 21, 2016, at 4:21 AM, Stuart Patterson <stuartjpatterson at googlemail.com> wrote:
> Dear David, 
> Thank you for your reply. Your suggestions on how better to write the command are very useful, and I can see how the simplification would help. I hadn't realised that the lower order variables would be included if I simply wrote in teh interaction terms. Thank you
> The table below is how I feel that I ought to present my results. If anyone has any suggestions on how to obtain these values I would be very grateful!

You should be able to fill in the first empty column using the results of coef(model) and 

expand.grid( Year=levels(Year), Prev_TB=levels(Prev_TB), Age=levels(Age) )

 Calculating the CI's by hand would require knowing the covariance matrix and you should be able to get the needed components from vcov(model). I wondered if using the confint function would be a more economical approach, however, when I look at the code of confint.default, it does not appear that it would handle contrasts involving interactions terms in a manner that I would think was totally correct. It appears to ignore the off-diagonal elements of the vcov matrix. Perhaps using the confint function in the multcomp package would yield more correct estimates. It's documented as "averaging over the interactions", however the code  appears to use only the diagonal elements as well.

I think the "right way" is outlined by Dalgaard and Laviolette in a recent Rhelp thread: 


You would set up the D matrix to represent levels in the grid of values to match the contrasts being considered. I did find a phia package that might automate this process, but I don't have experience with it.

Your table is probably only readable by me. The rest of the Rhelp world got a single column because you posting in HTML. Do learn the post to Rhelp in plain text.

I dropped it into OpenOffice.org, saved as csv/txt, and read it into R:

> read.csv("~/Documents/Untitled 1.csv")
    Variable        X          X.1 HRs ci.95. p.Value
1       Year Prev. TB          Age                   
2  2002-2005       No  < 24 months                   
3                     24-48 months                   
4                       >48 months                   
5                 Yes  < 24 months                   
6                     24-48 months                   
7                       >48 months                   
8  2006-2010       No  < 24 months                   
9                     24-48 months                   
10                      >48 months                   
11                Yes  < 24 months                   
12                    24-48 months   a      b       c
13                      >48 months                   
14 2011-2015       No  < 24 months                   
15                    24-48 months                   
16                      >48 months                   
17                Yes  < 24 months                   
18                    24-48 months                   
19                      >48 months                   


> On 18 November 2016 at 19:21, David Winsemius <dwinsemius at comcast.net> wrote:
> > On Nov 18, 2016, at 6:56 AM, Stuart Patterson via R-help <r-help at r-project.org> wrote:
> >
> > I have a time-dependent cox model with three variables, each of which
> > interacts with the other two. So my final model is:
> >
> > fit12<-coxph(formula = Surv(data$TimeIn, data$Timeout, data$Status) ~ data$
> > Year+data$Life_Stg+data$prev.tb +data$prev.tb*data$Life_Stg + data$Year*data
> > $Life_Stg + data$Year*data$prev.tb + frailty(data$Natal_Group), data = data)
> It seems fairly likely that you are shooting yourself in the foot by using the `data$variate` inside the formula. It will prevent the regression result from having correctly assembled references to variables. And that will become evident when you try to do any predictions. Try instead:
> fit12<-coxph(formula = Surv( TimeIn,  Timeout,  Status) ~
>             prev.tb * Life_Stg +  Year *Life_Stg +  Year * prev.tb + frailty( Natal_Group),
>             data = data)
> The `*` in a formula automatically includes the lower order individual variates in the estimates. Your model RHS could have also been written (more clearly in my opinion):
>          ~ (prev.tb + Life_Stg +  Year)^2
> ... since R formulas interpret the `(.)^N` operation as "all base effects and interactions up to order N".
> >
> > For my variables, there are 3 categories of year, three of year, and
> > prev.tb is a binary variable. Because of the interactions, when I present
> > the results, I want to present the Hazard ratio, 95% CI, and p value for
> > each combination of the three variables. How do I get R to give me these
> > values please?
> >
> > I think that the contrast function does this for other models but does not
> > work for coxph?
> The usual method would be to use `predict` on a 'newdata' dataframe with all the combinations generated from `expand.grid`. The combination of the reference values of all three variables should yield a 1.0 hazard ratio. But time-dependent model predictions need a complete specification of a sequence of values over the time course of the study (as well as specification of the frailty term. So I'm not in a position to comment on feasibility for this situation.
> >
> > Grateful for any suggestions
> >
> > Best wishes
> >
> > Stuart Patterson, Royal Veterinary College, University of London
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> David Winsemius
> Alameda, CA, USA

David Winsemius
Alameda, CA, USA

More information about the R-help mailing list