[R] formatting data for predict()

Ista Zahn izahn at psych.rochester.edu
Sun Sep 26 16:26:34 CEST 2010


Hi Andrew,
My inclination would be to put all the variables in a data.frame
instead of putting the predictors in a matrix. But if you want to
continue down this road, you need to have a column named dat in a the
data.frame that contains a matrix. I couldn't figure out how to do
such a thing in a single call, so I had to create it in a separate
step:

newdat <- data.frame(y=rep(NA, length(unique(x1))))
newdat$dat <- cbind(unique(x1), x2=0)
p2a=predict(mod2, type="response", newdata=newdat)
p2a

Hope it helps,
Ista

On Sun, Sep 26, 2010 at 4:38 AM, Andrew Miles <rstuff.miles at gmail.com> wrote:
> I'm trying to get predicted probabilities out of a regression model, but am
> having trouble with the "newdata" option in the predict() function.  Suppose
> I have a model with two independent variables, like this:
>
> y=rbinom(100, 1, .3)
> x1=rbinom(100, 1, .5)
> x2=rnorm(100, 3, 2)
> mod=glm(y ~ x1 + x2, family=binomial)
>
> I can then get the predicted probabilities for the two values of x1, holding
> x2 constant at 0 like this:
>
> p2=predict(mod, type="response", newdata=as.data.frame(cbind(x1, x2=0)))
> unique(p2)
>
> However, I am running regressions as part of a function I wrote, which feeds
> in the independent variables to the regression in matrix form, like this:
>
> dat=cbind(x1, x2)
> mod2=glm(y ~ dat, family=binomial)
>
> The results are the same as in mod.  Yet I cannot figure out how to input
> information into the "newdata" option of predict() in order to generate the
> same predicted probabilities as above.  The same code as above does not
> work:
>
> p2a=predict(mod2, type="response", newdata=as.data.frame(cbind(x1, x2=0)))
> unique(p2a)
>
> Nor does creating a data frame that has the names "datx1" and "datx2," which
> is how the variables appear if you run a summary() on mod2.  Looking at the
> model matrix of mod2 shows that the fitted model only shows two variables,
> the dependent variable y and one independent variable called "dat."  It is
> as if my two variables x1 and x2 have become two levels in a factor variable
> called "dat."
>
> names(mod2$model)
>
> My question is this:  if I have a fitted model like mod2, how do I use the
> "newdata" option in the predict function so that I can get the predicted
> values I am after?  I.E. how do I recreate a data frame with one variable
> called "dat" that contains two levels which represent my (modified)
> variables x1 and x2?
>
> Thanks in advance!
>
> Andrew Miles
> Department of Sociology
> Duke University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



More information about the R-help mailing list