[R] stratified variables in a cox regression

Sat Mar 28 14:53:58 CET 2009

On Sat, 28 Mar 2009, Bob Green wrote:

>
>> Hello,
>
> I am hoping for assistance in regards to examining the contribution of 
> stratified variables in a cox regression. A previous post by Terry Therneau 
> noted that "That is the point of a strata; you are declaring a variable toNOT 
> be proportional hazards, and thus there is no single "hazard ratio" that 
> describes it". Given this purpose of stratification, in the process of building 
> and testing a model, is there a way to test if the stratified variables do add 
> anything to a model?

I'm not aware of any formal test for whether stratification helps. It's difficult because you are adding an infinite-dimensional parameter to the model, and this parameter doesn't even appear in the partial likelihood. Nothing simple is going to work.

In principle one could compare the two stratum baseline cumulative hazards to see if they were proportional to each other, eg, see if  the difference in log-cumulative baseline hazard was constant over time. The bootstrap is valid for the baseline cumulative hazards, so one could get confidence intervals on a suitable summary statistic that way.

> Two variables were stratified because it was considered that the proportional 
> hazards assumption was not met (via inspection of log-log plots where the 
> curves crossed. I have examined. There were no cox.zph values that were 
> statistically significant. I did produce plots but found these difficult to 
> interpret).

There isn't much information loss in stratifying, as long as it's not overdone, which is probably why there hasn't been much work on tests.  The main loss is that the model becomes more complicated and harder to summarize.

> The statistician I have been consulting said that in SPSS when 
> variables are stratified a model is produced for each different strata (e.g a 
> separate analysis for male and female if a gender variable were stratified). 
> I have not seen this approach used in R examples I have seen.

Fitting a completely separate model for each stratum is equivalent to stratifying *and* adding a interaction with stratum to each predictor variable.  This does result in a loss of information, and is usually overkill.  You can add stratum interactions just to the variables where they are needed.

This may be related to the collision in terminology where epidemiologists say 'stratify' to mean 'do a completely separate analysis' and statisticians say 'stratify' to mean 'pool the stratum-specific analyses to get an overall estimate'.

        -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle