[R] logistic regression

Simon Blomberg Simon.Blomberg at anu.edu.au
Fri May 27 06:37:08 CEST 2005


predict.glm by default produces predictions on the scale of the 
linear predictors. If in a logistic regression, you want the 
predictions to be on the response scale [0,1],  use

x <- predict(logistic.model, medians, type="response")

for example. See ?predict.glm for details.

Cheers,

Simon.



>Hi
>
>I am working on corpora of automatically recognized utterances, looking
>for features that predict error in the hypothesis the recognizer is
>proposing. 
>
>I am using the glm functions to do logistic regression.  I do this type
>of thing:
>
>*       logistic.model = glm(formula = similarity ~., family = binomial,
>data = data)
>
>and end up with a model:
>
>>  summary(logistic.model)
>
>Call:
>glm(formula = similarity ~ ., family = binomial, data = data)
>
>Deviance Residuals:
>     Min       1Q   Median       3Q      Max 
>-3.1599   0.2334   0.3307   0.4486   1.2471 
>
>Coefficients:
>                         Estimate Std. Error z value Pr(>|z|)   
>(Intercept)           11.1923783  4.6536898   2.405  0.01617 * 
>length                -0.3529775  0.2416538  -1.461  0.14410   
>meanPitch             -0.0203590  0.0064752  -3.144  0.00167 **
>minimumPitch           0.0257213  0.0053092   4.845 1.27e-06 ***
>maximumPitch          -0.0003454  0.0030008  -0.115  0.90838   
>meanF1                 0.0137880  0.0047035   2.931  0.00337 **
>meanF2                 0.0040238  0.0041684   0.965  0.33439   
>meanF3                -0.0075497  0.0026751  -2.822  0.00477 **
>meanF4                -0.0005362  0.0007443  -0.720  0.47123   
>meanF5                -0.0001560  0.0003936  -0.396  0.69187   
>ratioF2ToF1            0.2668678  2.8926149   0.092  0.92649   
>ratioF3ToF1            1.7339087  1.7655757   0.982  0.32607   
>jitter                -5.2571384 10.8043359  -0.487  0.62656   
>shimmer               -2.3040826  3.0581950  -0.753  0.45120   
>percentUnvoicedFrames  0.1959342  1.3041689   0.150  0.88058   
>numberOfVoiceBreaks   -0.1022074  0.0823266  -1.241  0.21443   
>percentOfVoiceBreaks  -0.0590097  1.2580202  -0.047  0.96259   
>meanIntensity         -0.0765124  0.0612008  -1.250  0.21123   
>minimumIntensity       0.1037980  0.0331899   3.127  0.00176 **
>maximumIntensity      -0.0389995  0.0430368  -0.906  0.36484   
>ratioIntensity        -2.0329346  1.2420286  -1.637  0.10168   
>noSyllsIntensity       0.1157678  0.0947699   1.222  0.22187   
>startSpeech            0.0155578  0.1343117   0.116  0.90778   
>speakingRate          -0.2583315  0.1648337  -1.567  0.11706   
>---
>Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
>(Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 2462.3  on 4310  degrees of freedom
>Residual deviance: 2209.5  on 4287  degrees of freedom
>AIC: 2257.5
>
>Number of Fisher Scoring iterations: 6
>
>
>I have seen models where almost all the features are showing one in a
>thousand significance but I accept that I could improve my model by
>normalizing some of the features (some are left skewed and I understand
>that I will get a better fir by taking their logs, for example).
>
>What really worries me is that the logistic function produces
>predictions that appear to fall well outside 0 to 1.
>
>If I make a dataset of the medians of the above features and use my
>logistic.model on it, it produces a
>figure of:
>
>  > x = predict(logistic.model, medians)
>>  x
>[1] 2.82959
>>
>
>which is well outside the range of 0 to 1.
>
>The actual distribution of all the predictions is:
>
>>  summary(pred)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  -1.516   2.121   2.720   2.731   3.341   6.387
>>
>
>I can get the model to give some sort of prediction by doing this:
>
>>  pred = predict(logistic.model, data)
>>  pred[pred <= 1.5] = 0
>>  pred[pred > 1.5] = 1
>>  t = table(pred, data[,24])
>>  t
>    
>pred 0    1  
>    0  102  253
>    1  255 3701
>>
>>  classAgreement(t)
>$diag
>[1] 0.8821619
>
>$kappa
>[1] 0.2222949
>
>$rand
>[1] 0.7920472
>
>$crand
>[1] 0.1913888
>
>>
>
>but as you can see I am using a break point well outside the range 0 to
>1 and the kappa is rather low (I think).
>
>I am a bit of a novice in this, and the results worry me. 
>
>Can anyone comment if the results look strange, or if they know I am
>doing something wrong?
>
>Stephen
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Anti-Virus.
>
>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Visiting Fellow
School of Botany & Zoology
The Australian National University
Canberra ACT 0200
Australia

T: +61 2 6125 8057  email: Simon.Blomberg at anu.edu.au
F: +61 2 6125 5573

CRICOS Provider # 00120C




More information about the R-help mailing list