[R] Validating a Cox model on an external set

Tue Sep 28 20:45:35 CEST 2004

Thank you all for your very insightful comments! And thank you for the
directions to the packages!

Re: non-statistical issues, yes, I was looking through Altman and
Royston's Stat Med 2000 article "What do we mean by validating a
prognostic model?" last night and it was very interesting. I'm working
on expression profiling in tumor samples, and there are several
difficulties in designing an experiment along those 'ideal'
guidelines.

A. Small sample size is of course the most common recurring problem,
with concomitant even lower event rate.

B. Patient recruitment issues are yet another issue, as many of the
samples are degraded after time - the older the sample, the more
degraded it usually is! So that biases selection towards recent cases.
Different centres have different storage techniques, resulting in
extensive degradation in samples from 1 centre and relatively intact
samples from others. So there is no choice but to perform "data-driven
selection" of cases - i.e. only samples which have good RNA.

Other problems I've encountered include:

C. Computational time. using a training sample size of 50 arrays,
running a full internal cross-validation of a model derived using
pamr.cv.cox took my computer about one and a half hours (with no other
process running). (P4, 3 GHz, 2 GB RAM, R 1.9.1., Windows XP) And
that's just *one* randomization!

Min-Han

On Tue, 28 Sep 2004 10:55:50 -0700, Berton Gunter
<gunter.berton at gene.com> wrote:
> 
> But note that there may be deeper, non-statistical, issues of what you mean
> by "validation" here: how good must the predictions be on the validation
> data? How similar or dissimilar should the validation data be to the
> "training" data? To what end/population is the fitted model to be applied?
> For example, AFAIK in most scientific research, a model is not considered
> "validated" unless results can be substantively reproduced (??) in different
> labs, sometimes with alternative methods.
> 
> Think of the 1916 (I think it was) measurements of star positions during a
> total solar eclipse to "validate" Einstein's Theory of General Relativity.
> My point is not to say that this kind of "validation" is appropriate for a
> Cox model, but only that the issues are worth thinking about.
> 
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
> 
> "The business of the statistician is to catalyze the scientific learning
> process."  - George E. P. Box
> 
> 
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Frank
> > E Harrell Jr
> > Sent: Tuesday, September 28, 2004 10:11 AM
> > To: Min-Han Tan
> > Cc: r-help at stat.math.ethz.ch
> > Subject: Re: [R] Validating a Cox model on an external set
> >
> > Min-Han Tan wrote:
> > > Good morning,
> > >
> > > Sorry to trouble the list.
> > >
> > > I have a problem I hope to seek your advice on.
> > >
> > > Essentially, I am trying to 'validate' a multivariate Cox
> > proportional
> > > hazards model built in a training set, by testing it on an external
> > > test set. I have performed a survfit using the Cox model to predict
> > > survival for the test set, and obtained individual predictions for
> > > survival time, with standard error for each test sample.
> > Each of these
> > > cases has an actual survival time, some censored.
> > >
> > > How can we decide whether the Cox model has been validated or not?
> >
> > This is what the Design package and its cph and validate.cph and
> > calibrate.cph functions are for.
> >
> > >
> > > I was suggested survdiff in the survival package, but survdiff works
> > > between curves; am not sure how I could use it (I have a predicted
> > > curve for each curve, but no 'observed curve' - the only observation
> > > is death or censoring at time x)
> > >
> > > Thank you all so much!
> > >
> > > Min-Han Tan
> > > Van Andel Institute
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> > >
> >
> >
> > --
> > Frank E Harrell Jr   Professor and Chair           School of Medicine
> >                       Department of Biostatistics
> > Vanderbilt University
> > 
> > ______________________________________________
> 
> 
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> 
> 
> 
>