[R] problem with predict() for gam() models

Sat Jun 7 04:16:33 CEST 2003

I run the following code in R 1.6.2 on Windows:

xxx <- rnorm(100)
yyy <- .5 * rnorm(100) + sqrt(1-.5^2) * rnorm(100)
ord <- order(xxx)
xxx <- xxx[ord] # for
yyy <- yyy[ord] #    convenience in reading printout
rm(ord)
reg.gam <- gam(yyy ~ s(xxx, k=8))

f <- function(x, reg.gam, target.y) {
    cat("inside f() called by optimize():\n")
    cat("arg x=", x, "\n")
    cat("arg target.y=", target.y, "\n", sep="")
    pred.y <- predict(reg.gam, newdata=data.frame(constant=1, xxx=x))
    cat("predicted y=\n")
    print(pred.y)
    sq.diff <- (pred.y - target.y)^2
    cat("returned value=\n")
    print(sq.diff)
    return( sq.diff )
}

temp <- optimize(f, interval=c(xxx[1], xxx[100]), reg.gam=reg.gam, 
target.y=mean(range(yyy)))

as a try-out of some code.  I need to interpolate in the output of a gam() 
model, to find a certain x-value corresponding to a chosen y-value.  I 
thought it would be better to interpolate using predict() from the gam() 
model, rather than use approx() since I don't expect linearity, or anything 
that's necessarily very close to it, in the part of the regression where 
I'm going to need to interpolate.  Since the regression itself is on 
splines, I didn't think I needed to call spline interpolation; I thought

In my real-life application the regression isn't linear, like the one I 
constructed here (it looks somwhat, but not quite enough, like a logistic 
regression), but I thought it would simplify my example to construct do 
things this way.

Alas, with these data (or my real, nonlinear, data) I get an error from 
optimize() (either with these fake data, or with my real data) because f() 
delivers a vector-valued result.  The debugging print statements inside f() 
show that sometimes (but not nearly always) even though a scalar argument x 
is submitted and put into the newdata to predict(), I get a vector result 
pred.y[].  Note:  most of the time, I get a scalar valued result, but often 
enough I get a vector output from f().  I don't at this time see what 
differentiates arguments that yield scalar result for f(), from ones that 
yield a vector value for f().  x the argument to f() is always scalar.

The problem occurs whether I use newdata=data.frame(xxx=x) (i.e., minus the 
"constant=1" part, or newdata as shown, as an argument to predict().

Can anyone help me understand why this is happening, and how to ensure that 
the result of f() is scalar, so I get the interpolation I want, instead of 
having optimize() blow up?

Thanks in advance for any advice.  Regards,