[R] Logistic regression model returns lower than expected logit

Mon May 16 17:37:53 CEST 2011

Hi all,
I'm using a logistic regression model (created with 'glm') with 3 variables
to separate true positives from errors in a data set. All in all it seems to
perform quite well, but for some reason the logit values seem to be much
lower that they should be. What I mean is that in order to get ~90%
sensitivity and ~90% precision I have to set my logit cutoff at around -1 or
0. From my (very limited) understanding a logit cutoff of 0 should give you
around 50% precision (half your final data set it TP, half is FP). I get
this effect when I run the model on the same data it was trained on. My only
idea for a cause of this so far is that my training data set had roughly 10x
as many true-negative data points as true-positive data points, but evening
them out didn't seem to fix the problem much. 

Here is my model summary with output from R's glm
=====================================
Deviance Residuals: 
Min 1Q Median 3Q Max 
-4.48817 -0.17130 -0.10221 -0.05374 3.36833 

Coefficients:
Estimate Std. Error z value Pr(>|z|) 
(Intercept) -0.85666 0.33868 -2.529 0.011425 * 
var1 1.08770 0.15364 7.080 1.45e-12 ***
var2 0.67537 0.08003 8.439 < 2e-16 ***
var3 -1.25332 0.33595 -3.731 0.000191 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1230.63 on 2034 degrees of freedom
Residual deviance: 341.81 on 2031 degrees of freedom

=====================================

thanks in advance!

--
View this message in context: http://r.789695.n4.nabble.com/Logistic-regression-model-returns-lower-than-expected-logit-tp3526542p3526542.html
Sent from the R help mailing list archive at Nabble.com.