[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

Fri Oct 1 21:36:38 CEST 2010

I have this discussion fairly often with doctors that I work with.  The issue is that you can certainly predict from a model, but you can predict on different scales.  Let's consider the simpler case of just 2 outcomes (disease yes/no):

Let's say you have 4 patients that you want to predict their disease status using their symptoms and a model, on the probability scale patient A is predicted to have 5% chance of yes, patient B is 49%, patient C is 51% and patient D is 95% probability of yes.  If we collapse this to just a prediction of yes/no then that means that we will treat A and B the same with a prediction of NO and patients C and D the same with a prediction of YES.  But does it really make sense to treat B and C so differently (they are only 2 percentage points different) while treating them the same as A or D?

If I were one of the patients I would want to know whether my probability of disease was 51% or 95%, not just a yes. 

With 3 groups wouldn't you want to know the difference between 33%, 33%, 34% and 2%, 8%, 90%?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of peterfrancis at me.com
> Sent: Friday, October 01, 2010 8:23 AM
> To: Frank Harrell
> Cc: r-help at r-project.org
> Subject: Re: [R] Interpreting the example given by Frank Harrell in the
> predict.lrm {Design} help
> 
> The reason I am trying to assign them is because I have a data set
> where i have arrived at  the most likely model that describes the data
> and now I have another dataset where I know the factors but not the
> response.
> 
> Therefore, surely I need to assign the predicted values to a response
> in order to say something like:
> 
> Based on the model I believe unknown 1 is good, where as unknown 2 is
> very good etc?
> 
> Maybe I am missing something or using the wrong approach but I thought
> the main purpose of using the predict function on new data was to
> "predict" the response?
> 
> Peter
> 
> On 1 Oct 2010, at 14:51, Frank Harrell <f.harrell at vanderbilt.edu>
> wrote:
> 
> >
> > Why assign them at all?  Is this a "forced choice at gunpoint"
> problem?
> > Remember what probabilities mean.
> >
> > Frank
> >
> > -----
> > Frank Harrell
> > Department of Biostatistics, Vanderbilt University
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-
> Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.