[R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Tue Oct 14 05:06:14 CEST 2008
Robert W. Baer, Ph.D. wrote:
>
> ----- Original Message ----- From: "Frank E Harrell Jr"
> <f.harrell at vanderbilt.edu>
> To: "John Sorkin" <jsorkin at grecc.umaryland.edu>
> Cc: <r-help at r-project.org>; <dieter.menne at menne-biomed.de>;
> <p.dalgaard at biostat.ku.dk>
> Sent: Monday, October 13, 2008 2:09 PM
> Subject: Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
>
>
>> John Sorkin wrote:
>>> Frank,
>>> Perhaps I was not clear in my previous Email message. Sensitivity and
>>> specificity do tell us about the quality of a test in that given two
>>> tests the one with higher sensitivity will be better at identifying
>>> subjects who have a disease in a pool who have a disease, and the
>>> more sensitive test will be better at identifying subjects who do not
>>> have a disease in a pool of people who do not have a disease. It is
>>> true that positive predictive and negative predictive values are of
>>> greater utility to a clinician, but as you know these two measures
>>> are functions of sensitivity, specificity and disease prevalence. All
>>> other things being equal, given two tests one would select the one
>>> with greater sensitivity and specificity so in a sense they do
>>> measure the "quality" of a clinical test - but not, as I tried to
>>> explain the quality of a statistical model.
>>
>> That is not very relevant John. It is a function of all those things
>> because those quantities are all deficient.
>>
>> I would select the test that can move the pre-test probability a great
>> deal in one or both directions.
>
> Of course, this quantity is known as a likelihood ratio and is a
> function of sensitivity and specificity. For 2 x 2 data one often
> speaks of postive likelihood ratio and negative likelihood ratio, but
> for multi-row contingency table one can define likelihood ratios for a
> series of cut-off points. This has become a popular approach in
> evidence-based medicine when diagnostic tests have continuous rather
> than binary outputs.
This approach leaves much to be desired. I hope that its practitioners
start gauging it by the mean squared error of predicted probabilities.
Likelihood ratios are "half" of odds ratios (odds ratio = product of LR+
and LR-) but in a practical sense they are not equivalent because the
vast majority of likelihood ratios provided in the literature are crude,
marginal, unadjusted likelihood ratios. Odds ratios from easy-to-fit
logistic models are conditional or partial odds ratios and so are
patient specific and not population averaged.
Frank
>
>>> You are of course correct that sensitivity and specificity are not
>>> truly "inherent" characteristics of a test as their values may change
>>> from population-to-population, but paretically speaking, they don't
>>> change all that much, certainly not as much as positive and negative
>>> predictive values.
>>
>> They change quite a bit, and mathematically must change if the disease
>> is not all-or-nothing.
>>
>>>
>>
>>> I guess we will disagree about the utility of sensitivity and
>>> specificity as simplifying concepts.
>>>
>>> Thank you as always for your clear thoughts and stimulating comments.
>>
>> And thanks for yours John.
>> Frank
>>
>>> John
>>>
>>>
>>>
>>>
>>> among those subjects with a disease and the one with greater
>>> specificity will be better at indentifying John David Sorkin M.D.,
>>> Ph.D.
>>> Chief, Biostatistics and Informatics
>>> University of Maryland School of Medicine Division of Gerontology
>>> Baltimore VA Medical Center
>>> 10 North Greene Street
>>> GRECC (BT/18/GR)
>>> Baltimore, MD 21201-1524
>>> (Phone) 410-605-7119
>>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>>
>>>>>> Frank E Harrell Jr <f.harrell at vanderbilt.edu> 10/13/2008 2:35 PM >>>
>>> John Sorkin wrote:
>>>> Jumping into a thread can be like jumping into a den of lions but
>>>> here goes . . .
>>>> Sensitivity and specificity are not designed to determine the
>>>> quality of a fit (i.e. if your model is good), but rather are
>>>> characteristics of a test. A test that has high sensitivity will
>>>> properly identify a large portion of people with a disease (or a
>>>> characteristic) of interest. A test with high specificity will
>>>> properly identify large proportion of people without a disease (or
>>>> characteristic) of interest. Sensitivity and specificity inform the
>>>> end user about the "quality" of a test. Other metrics have been
>>>> designed to determine the quality of the fit, none that I know of
>>>> are completely satisfactory. The pseudo R squared is one such measure.
>>>> For a given diagnostic test (or classification scheme), different
>>>> cut-off points for identifying subject who have disease can be
>>>> examined to see how they influence sensitivity and 1-specificity
>>>> using ROC curves.
>>>> I await the flames that will surely come my way
>>>>
>>>> John
>>>
>>> John this has been much debated but I fail to see how backwards
>>> probabilities are that helpful in judging the usefulness of a test.
>>> Why not condition on what we know (the test result and other baseline
>>> variables) and quit conditioning on what we are trying to find out
>>> (disease status)? The data collected in most studies (other than
>>> case-control) allow one to use logistic modeling with the correct
>>> time order.
>>>
>>> Furthermore, sensitivity and specificity are not constants but vary
>>> with subjects' characteristics. So they are not even useful as
>>> simplifying concepts.
>>>
>>> Frank
>>>>
>>>>
>>>>
>>>> John David Sorkin M.D., Ph.D.
>>>> Chief, Biostatistics and Informatics
>>>> University of Maryland School of Medicine Division of Gerontology
>>>> Baltimore VA Medical Center
>>>> 10 North Greene Street
>>>> GRECC (BT/18/GR)
>>>> Baltimore, MD 21201-1524
>>>> (Phone) 410-605-7119
>>>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>>>
>>>>>>> Frank E Harrell Jr <f.harrell at vanderbilt.edu> 10/13/2008 12:27 PM
>>>>>>> >>>
>>>> Maithili Shiva wrote:
>>>>> Dear Mr Peter Dalgaard and Mr Dieter Menne,
>>>>>
>>>>> I sincerely thank you for helping me out with my problem. The thing
>>>>> is taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97%
>>>>> and SPEC = Bb / (Bb + Gb) = 74.38%.
>>>>>
>>>>> Now I have values of SENS and SPEC, which are absolute in nature.
>>>>> My question was how do I interpret these absolue values. How does
>>>>> these values help me to find out wheher my model is good.
>>>>>
>>>>> With regards
>>>>>
>>>>> Ms Maithili Shiva
>>>> I can't understand why you are interested in probabilities that are
>>>> in backwards time order.
>>>>
>>>> Frank
>>>>
>>>>> ________________________________________________________________________
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
>>>>>> To: r-help at r-project.org Date: Friday, October 10, 2008, 5:54 AM
>>>>>> Hi
>>>>>>
>>>>>> Hi I am working on credit scoring model using logistic
>>>>>> regression. I havd main sample of 42500 clentes and based on
>>>>>> their status as regards to defaulted / non - defaulted, I
>>>>>> have genereted the probability of default.
>>>>>>
>>>>>> I have a hold out sample of 5000 clients. I have calculated
>>>>>> (1) No of correctly classified goods Gg, (2) No of correcly
>>>>>> classified Bads Bg and also (3) number of wrongly classified
>>>>>> bads (Gb) and (4) number of wrongly classified goods (Bg).
>>>>>>
>>>>>> My prolem is how to interpret these results? What I have
>>>>>> arrived at are the absolute figures.
>>>>>>
>>>
>>>
>>
>>
>> --
>> Frank E Harrell Jr Professor and Chair School of Medicine
>> Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list