[R] User error in calling predict/model.frame

Joshua Wiley jwiley.psych at gmail.com
Sat Jan 29 03:56:53 CET 2011


Dear Russell,

On Fri, Jan 28, 2011 at 5:59 PM, Russell Pierce <rpier001 at ucr.edu> wrote:
> Ista & r-help list,
>
> I guess I left out the most important part of any question, my reason
> for doing this.  I am interested in the predicted y value at the mean,
> 1 SD above of the mean, and 1 SD below the mean for each predictor.
> Since I conducted my analysis with my IVs in the scale of Z (i.e. my
> formula for lm.obj was out ~ scale(xxA)*scale(xxB)), I expected to be
> able to define my predictors in newdata in terms of Z.  It seems like
> predict should be the right function to achieve these aims.

predict() is the right choice.  The problem lies with your model
object (or your new data if you prefer to think of it that way), not
predict().

> I made several (major) errors in my initial example, though the nature
> of the error that R produces is the same, this may have added
> considerably to the confusion.  I apologize. A more concise example of
> code that (I think) should work, but doesn't is:
>   set.seed(10)
>   dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20))
>   dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20))
>   lm.res.scale <- lm(out ~ scale(xxA)*scale(xxB),data=dat)
>   my.data <- lm.res.scale$model #load the data from the lm object

Just pause for a moment here, and look at what you actually have:

class(my.data[, "scale(xxA)"])
str(my.data[, "scale(xxA)"])

>   newdata <- expand.grid(X1=c(-1,0,1),X2=c(-1,0,1))
>   names(newdata) <- c("scale(xxA)","scale(xxB)")

Now let's look at the data you made:

class(newdata[, "scale(xxA)"])
str(newdata[, "scale(xxA)"])

Notice that this differs from the data in the model object.  Of course
it would be possible to work around this using clever techniques, but
this way leads to pain.  Fortunately, there are alteRnatives.
Consider instead:

##########################
set.seed(10)
dat <- data.frame(xxA = rnorm(20, 10), xxB = rnorm(20, 20))
d2 <- as.data.frame(lapply(dat, scale))
names(d2) <- paste("s", names(dat), sep = ".")
dat <- cbind(dat, d2, out = with(dat, xxA + xxB + xxA * xxB + rnorm(20, 20)))
lm.res.scale <- lm(out ~ s.xxA * s.xxB, data = dat)
newdata <- expand.grid(s.xxA = -1:1, s.xxB = -1:1)
(newdata$Y <- predict(lm.res.scale, newdata))
###########################


>   newdata$Y <- predict(lm.res.scale,newdata)
>
> Using your solution:
>   names(newdata) <- names(dat)[1:2]
>
> As you say, your solution is doing something other than:
>
>   coef(lm.res.scale)[1]+coef(lm.res.scale)[2]*newdata[,1]+coef(lm.res.scale)[3]*newdata[,2]+coef(lm.res.scale)[4]*newdata[,1]*newdata[,2]
>
> I really want a solution that, in one step will provide values like
> the above code. I think that should be exactly what predict() should
> do (now that I fixed newdata so that it doesn't have the sd() scaling
> factors).  I think my example code should be equivalent to:
>   dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20))
>   dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20))
>   #rescaling outside of lm
>   X1 <- with(dat,as.vector(scale(xxA)))
>   X2 <- with(dat,as.vector(scale(xxB)))
>   y  <- with(dat,out)
>   lm.res.correct <- lm(y~X1*X2)
>   my.data <- lm.res.correct$model #load the data from the lm object
>   newdata <- expand.grid(X1=c(-1,0,1),X2=c(-1,0,1))
>   #No need to rename newdata as it matches my lm object already
>   newdata$Y <- predict(lm.res.correct,newdata)
>
> Notably, adjusting my formula to include as.vector() does not solve
> the problem with my attempt to use predict() directly with newdata.

Not quite sure which formula you adjusted with "as.vector()".  But in
general, the ability to transform data directly in the model call is
for quick 'n dirty convenience.  You can quickly get into a situation
where your variable names are unwieldy (possibly containing special
characters like spaces, and ",").

Cheers,

Josh

P.S. I did my undergraduate in psychology at Riverside.

>
> Thanks again,
>
> Russell Pierce
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list