[R] Logistic regression model + precision/recall

Wed Jan 24 15:59:44 CET 2007

nitin jindal wrote:
> On 1/24/07, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
> 
>> Why 0.5?
> 
> 
> The probability has to adjusted based on some hit and trials. I just
> mentioned it as an example

Using a cutoff is not a good idea unless the utility (loss) function is 
discontinuous and is the same for every subject (in the medical field 
utilities are almost never constant).  And if you are using the data to 
find the cutoff, this will require bootstrapping to penalize for the 
cutoff not being pre-specified.

> 
>> Those are improper scoring rules that can be tricked.  If the outcome is
>> rare (say 0.02 incidence) you could just predict that no one will have
>> the outcome and be correct 0.98 of the time.  I suggest validating the
>> model for discrimination (e.g., AUC) and calibration.
> 
> 
> I just have to calculate precision/recall for rare outcome. If the positive
> outcome is rare ( say 0.02 incidence) and I predict it to be negative all
> the time, my recall would be 0, which is bad. So, precision and recall can
> take care of skewed data.

No, that is not clear.  The overall classification error would only be 
0.02 in that case.  It is true though that one of the two conditional 
probabilities would not be good.

> 
> Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University