[R] problems with glm

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.be
Tue Oct 2 09:13:46 CEST 2007


you could also give a try to the following piece of code:

form$finished <- factor(form$finished)
glmFit <- glm(finished ~ ., family = binomial, data = 
form[1:150000, ])
preds <- predict(glmFit, newdata = form[150001:200000, ], type = 
"response")

Note also the following:

* since you supply the `data' argument of glm() you do not need to 
specify the `formula' argument as "data$y ~ data$x", just use "y ~ x", 
etc.

* for predict.glm() the argument is `newdata' not `data', and also 
that `type = "response"' gives you the predicted probabilities; look 
at ?predict.glm() for more info.


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: <stephenc at ics.mq.edu.au>
To: <r-help at stat.math.ethz.ch>
Sent: Tuesday, October 02, 2007 5:34 AM
Subject: [R] problems with glm


>I am having a couple of problems someone may be able to cast some 
>light on.
>
>
> Question 1:
>
> I am making a logistic model but when i do this:
>
> glm.model = glm(as.factor(form$finished) ~ ., family=binomial,
> data=form[1:150000,])
>
> I get this:
>
>
> Error in model.frame(formula, rownames, variables, varnames, extras,
> extranames,  :
>        variable lengths differ (found for 'barrier')
>
>
> which is very strange because when I name the predictive factors 
> like this:
>
> glm.model = glm(as.factor(form$finished) ~ form$first + form$second 
> +
> form$third + form$barrier, family=binomial, data=form[1:150000,])
>
> It produces a model:
>
> Call:
> glm(formula = as.factor(form$finished) ~ form$first + form$second +
>    form$third + form$barrier, family = binomial, data = 
> form[1:150000,
>    ])
>
> Deviance Residuals:
>    Min       1Q   Median       3Q      Max
> -3.0884  -0.4932  -0.3951  -0.3006   2.7135
>
> Coefficients:
>              Estimate Std. Error  z value Pr(>|z|)
> (Intercept)  -2.957831   0.021446 -137.920  < 2e-16 ***
> form$first    0.624463   0.078036    8.002 1.22e-15 ***
> form$second   0.754057   0.080787    9.334  < 2e-16 ***
> form$third    7.718261   0.078532   98.281  < 2e-16 ***
> form$barrier -0.058185   0.002175  -26.751  < 2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 144850  on 215213  degrees of freedom
> Residual deviance: 133292  on 215209  degrees of freedom
> AIC: 133302
>
> Number of Fisher Scoring iterations: 5
>
> Any idea why the first glm call doesn;t work?
>
> Second Question:
>
> Now I want to predict so i do this:
>
> pred <- predict(glm.model,data=form[150001:20000,],type="response")
>
> but when I try to use it I get this:
>
>> pred <- 
>> predict(glm.model,data=form[150001:200000,],type="response")
>> t = table(pred,form$finished[150001:200000])
> Error in table(pred, form$finished[150001:2e+05]) :
>        all arguments must have the same length
>
> and when I do this it confirms my pred is not 50000 long!
>
>> length(pred)
> [1] 215214
>
> It doesn't look as though my slection of  rows has worked at all. 
> Anyone
> know why?
>
> Stephen
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



More information about the R-help mailing list