[R] How to assess the accuracy of fitted logistic regression using glm

Xiaobo Gu guxiaobo1982 at gmail.com
Fri Jun 10 08:54:10 CEST 2011


Hi Professor Brian,

Thanks for your reply.

I think there are many statisticians here, and it is somehow R
related, hoping someone can
help me.

I have done a simple test, using a sample csv data which I post if need.

donut <- read.csv(file="D:/donut.csv", header = TRUE);
donut[["color"]] <- as.factor(donut[["color"]])
donut[["shape"]] <- as.factor(donut[["shape"]])
donut[["k"]] <- as.factor(donut[["k"]])
donut[["k0"]] <- as.factor(donut[["k0"]])
donut[["bias"]] <- as.factor(donut[["bias"]])

lr <- glm(color ~ shape + x + y, family = binomial, data = donut);
summary(lr)

Call:
glm(formula = color ~ shape + x + y, family = binomial, data = donut)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-2.1079  -0.9476   0.5086   0.7518   1.4079

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  2.53010    1.65500   1.529   0.1263
shape22      0.05628    1.54990   0.036   0.9710
shape23     -0.74568    1.44813  -0.515   0.6066
shape24     -2.61896    1.38016  -1.898   0.0578 .
shape25     -2.07648    1.32818  -1.563   0.1180
x           -0.45885    1.52863  -0.300   0.7640
y           -0.59311    1.46999  -0.403   0.6866
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 50.446  on 39  degrees of freedom
Residual deviance: 42.473  on 33  degrees of freedom
AIC: 56.473

Number of Fisher Scoring iterations: 4

In the Coefficients section, is Pr(>|z|) the P-value for that
variable, and there
are a few other questions:
1. How to determine the predict power of each variables?
2. How to determine the overall performance of the fitted model, here what's the
difference between and "Deviance Residuals" and "Residual deviance"?
3. How to compare "Null deviance" and "Residual deviance"?
4. What does AIC mean, and how to use this measure?
5. What does the Signif. codes section mean?

Regards,

Xiaobo Gu



On Mon, Jun 6, 2011 at 9:59 PM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Mon, 6 Jun 2011, Xiaobo Gu wrote:
>
>> Hi,
>>
>> I am trying glm with family = binomial to do binary logistic
>> regression, but how can I assess the accuracy of the fitted model, the
>> summary method can print a lot of information about the returned
>> object, such as coefficients, because statistics is not my speciality,
>> so can you share some rule of thumb to exam the  fitted model from the
>> practical perspective.
>
> It depends entirely on why you did the fit.  People have written whole books
> on assessing the performance of classification procedures such as binary
> logistic regression.  For example, the residual deviance is closely related
> to log-probability scoring: for some purposes that is a good performance
> measure, for others (e.g. when you are going to threshold the predicted
> probabilities) it can be very misleading.
>
> In short, you need statistical advice, not R advice (the purpose of this
> list).
>
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-help mailing list