[R] Using predict.glm for classification

Sun Oct 29 15:18:15 CET 2006

Dear R users,

I'm trying to understand how to derive the actual predictions (in terms of
class) using predict.glm. Consider this example:

mydf=data.frame(A=sample(rnorm(1000), size=1000, replace=T), B=sample(rnorm(5),
size=1000, replace=T), C=sample(rnorm(10), size=1000, replace=T),
class=sample(c("a", "b"), size=1000, replace=T))
mydf.glm=glm(class ~ .^2, data=mydf, family=binomial)
ind=sample(1:nrow(mydf), size=0.5*nrow(mydf), replace=F)
mydf.glm=glm(class ~ .^2, data=mydf[ind,], family=binomial)
mydf.pred=predict(mydf.glm, newdata=mydf[-ind,], type="response", se=T)

My question is what does the vector mydf.pred$fit indicate? If it has a value of
say 0.42 does it mean that the probability that the response is "a" is 0.42 and
that the response is "b" 1-0.42 (so for a threshold of 0.5 the class would be
"b") ?

I would appreciate any comments or help on this.

Many thanks
Eleni Rapsomaniki
Birkbeck College, UK