[R] what is the difference between survival analysis and (...)

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Mar 28 19:47:10 CEST 2007


That is not the only way to apply logistic regression to this problem 
(although it is a common error in the analysis of cancer studies).

One can discretize time and apply logistic regression to survival over 
each short time period (jointly): doing so comes pretty close to what the 
Cox proportional hazard models do but can be harder to interpret.  But 
that is a far more sophisticated analysis than looking at crude relative 
risks for subgroups, and my understanding of Sir David Cox's motivation 
was to be able to do regression modelling of prognostic factors, not just 
comparison of groups.

I would claim that the analysis of the Australian AIDS data in MASS needs 
regression-like methods to extract all the information from what is a 
limited and expensive set of data (and I happen to know Cox agrees).


On Wed, 28 Mar 2007, Christos Hatzis wrote:

> On the same point, transforming time-to-event data to binary outcomes so
> that contingency-table analysis (odds ratios etc) or logistic regression can
> be applied will result in loss of information that could lead to misleading
> conclusions.
>
> For example, assuming that there is a good-prognosis group (low risk) and a
> poor-prognosis group (high risk) that need to be compared.  By definition,
> patients in the good prognosis group are those that have been followed up
> for a longer time in the study, whereas patients with poor prognosis will
> tend to die earlier.  Therefore censoring will occur later in the good
> prognosis group and thus the two groups will not have a homogeneous
> censorship structure. In this case, naïve analysis could be misleading.
>
> For more details and a simulation example take a look at
>
> http://jnci.oxfordjournals.org/cgi/data/99/2/147/DC1/3
>
> HTH
>
> -Christos
>
> Christos Hatzis, Ph.D.
> Nuvera Biosciences, Inc.
> 400 West Cummings Park
> Suite 5350
> Woburn, MA 01801
> Tel: 781-938-3830
> www.nuverabio.com
>
>
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Lucke, Joseph F
>> Sent: Wednesday, March 28, 2007 12:10 PM
>> To: Eric Elguero; R-help at stat.math.ethz.ch
>> Subject: Re: [R] what is the difference between survival
>> analysis and (...)
>>
>> You can (and I have) fit survival data with logistic
>> regression. Agresti (1990, pp 189--196) has an introductory
>> discussion.
>>
>> The issue is whether the occurrence of the event is of
>> interest or whether the time-to-event is of interest. If the
>> study lasts 180 days (as in my case) logistic regression
>> treats an event at 1 day the same as an event at 179 days.
>> Similarly, non-occurrence censored at 5 days is treated the
>> same as non-occurrence censored at 180 days. These
>> assumptions only make sense if the hazard rate is constant
>> and (therefore) the time-to-failure distribution is exponential.
>>
>> One can include exposure time as a offset (non-estimated
>> covariate) to handle non-constant hazard rates. One can also
>> model the hazard rate directly as a log-linear model.
>>
>> Based on what he said (number events/sample size, using
>> cumulative times), the hostile medical epidemiologist was
>> implicitly assuming the survival time followed an exponential
>> distribution. This assumption is often incorrect.   His
>> arrogance was exceeded only by his ignorance.
>>
>> Joe
>>
>> @BOOK{Agresti1990,
>>   author = {Agresti, Alan},
>>   title = {Categorical data analysis},
>>   year = {1990},
>>   publisher = {John Wiley \& Sons},
>>   address = {New York, NY},
>>   series = {Wiley Series in Probability and Mathematical Statistics},
>>   keywords = {loglinear; logistic}
>> }
>>
>>
>>
>>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Eric Elguero
>> Sent: Wednesday, March 28, 2007 8:40 AM
>> To: R-help at stat.math.ethz.ch
>> Subject: Re: [R] what is the difference between survival
>> analysis and (...)
>>
>> Hi everybody,
>>
>> recently I had to teach a course on Cox model, of which I am
>> not a specialist, to an audience of medical epidemiologists.
>> Not a good idea you might say.. anyway, someone in the
>> audience was very hostile. At some point, he sayed that Cox
>> model was useless, since all you have to do is count who dies
>> and who survives, divide by the sample sizes and compute a
>> relative risk, and if there was significant censoring, use
>> cumulated follow-up instead of sample sizes and that's it!
>> I began arguing that in Cox model you could introduce several
>> variables, interactions, etc, then I remembered of logistic
>> models ;-) The only (and poor) argument I could think of was
>> that if mr Cox took pains to devise his model, there should
>> be some reason...
>>
>> but the story doesn't end here. When I came back to my
>> office, I tried these two methods on a couple of data sets,
>> and true, crude RRs are very close to those coming from Cox model.
>>
>> hence this question: could someone provide me with a dataset
>> (preferably real) where there is a striking difference
>> between estimated RRs and/or between P-values? and of course
>> I am interested in theoretical arguments and references.
>>
>> sorry that this question has nothing to do with R and thank
>> you in advance for your leniency.
>>
>> Eric Elguero
>> GEMI-UMR 2724 IRD-CNRS,
>> Équipe "Évolution des Systèmes Symbiotiques"
>> 911 avenue Agropolis, BP 64501,
>> 34394 Montpellier cedex 5 FRANCE
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list