[R] ROC curve

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Mon May 24 14:35:20 CEST 2010


On 05/24/2010 02:14 AM, Claudia Beleites wrote:
> Dear Changbin,
>
>> I want to know how to select the optimal decision threshold from the ROC
>> curve?
> Depends on what optimal means. I think there are a bunch of different
> criteria used:
>
> - point closest to the ideal model
> - point furthest from the "guessing" model
> - these criteria may include costs, i.e. a FP/FN ratio != 1
> - ...
>
> More practical:
> If you use ROCR: the help of the performance class explains the slots in
> the object. You find there the data of the curve, incl. the thresholds.
>
>> At what threshold will give the highest accuracy?
> to know that, optmize the accuracy as function of the threshold.
>
> Remember: finding the optimal threshold from a ROC curve is a
> data-driven optimization. You need to validate the resulting model with
> independent test data afterwards.

That point is excellent.  In addition, such decision analysis assumes 
that (1) a forced yes/no decision is acceptable, i.e., a predicted 
probability in the middle is forced to be categorized as "low" or "high" 
as opposed to "no decision; get more data", and (2) the 
utility/cost/loss function is identical across subjects (which it almost 
never is).

Frank

-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list