[R] glm predict on new data

Brian Diggs diggsb at ohsu.edu
Thu Apr 7 00:28:01 CEST 2011


On 4/6/2011 2:17 PM, dirknbr wrote:
> I am aware this has been asked before but I could not find a resolution.
>
> I am doing a logit
>
> lg<- glm(y[1:200] ~ x[1:200,1],family=binomial)

glm (and most modeling functions) are designed to work with data frames, 
not raw vectors.

> Then I want to predict a new set
>
> pred<- predict(lg,x[201:250,1],type="response")
>
> But I get varying error messages or warnings about the different number of
> rows. I  have tried data/newdata and also to wrap in data.frame() but cannot
> get to work.

I'll made up some data, show the way you approached it, show where it 
went wrong, and then how it works more easily.

# data like what I think you had:
y <- rbinom(200, 1, prob=.8)
x <- data.frame(x=rnorm(250))

# your glm call:
lg <- glm(y[1:200]~x[1:200,1],family=binomial)

# take a look at print(lg).  Notice that your independent variable
# name is "x[1:200, 1]", which is what you would need to match in
# a call to predict.

# Make data.frames of the given and testing data.
DF <- data.frame(y=y, x=x[1:200,1])
DF.new <- data.frame(x=x[200:250,1])
# Notice DF.new has the same name (x) as DF.

lg <- glm(y~x, data=DF, family=binomial)
pred <- predict(lg, newdata=DF.new, type="response")
summary(pred)

> Help would be appreciated.
>
> Dirk.

-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



More information about the R-help mailing list