[R] Problems with mars in R in the case of nonlinear functions

Stephen Milborrow milbo at sonic.net
Fri Jun 13 14:34:38 CEST 2008


| I'm trying to use mars function in R to interpolate nonlinear
| multivariate functions.
| However, it seems that mars gives me a fit which uses only very few
| basis function and it underfits very badly.

Try the "earth" package which extends the mars function in the mda package.

Your example becomes

library(earth) # was mda
f <- function(x,y) { x^2-y^2 }
x <- seq(-1,1,length=10)
x <- outer(x*0,x,FUN="+")
y <- t(x)
X <- cbind(as.vector(x),as.vector(y))
z <- f(x,y)
fit <- earth(X, as.vector(z))
summary(fit)
plotmo(fit) # note better fit than before
# your original plotting code could be used too

For this kind of data, you could possibly use the minspan parameter.  MARS
by default does not allow every observation to be used as a knot in the
generated basis functions. This strategyy increases resistance to runs of
correlated noise in the data.  For non-noisy data, you can set minspan=1 to
allow MARS to consider
every observation as a potential knot.  If your data were noisy then
minspan=1 could overfit the data.  With earth, you can use trace=2 to see
the calculated minspan value.

If you run the above example with the earth parameter trace=1, you will see
that the stopping condition for the forward pass is:

Reached delta RSq threshold (DeltaRSq 0.00030214 < 0.001)

To make the forward pass continue further, change the "delta RSq threshold"
by using the thresh parameter:

fit <- earth(X, as.vector(z), thresh=1e-6)

The resulting model "looks" better when plotted, but note that using thresh
here makes almost no change to the GRSq.  That is, with the lower threshold
the model is more complicated (has more terms) but does not have a greater
predictive power.  The threshold is just one of the reasons that the forward
pass can terminate (reaching the the maximum number of terms nk is another).
AFAIK Friedman's code (that you ran from Matlab) does not use the threshold
but instead just continues forward stepping until nk is reached.  In this
case the Matlab model is arguably more complicated than it need be.  I
believe the forward threshhold for MARS was an innovation of Hastie and
Tibshirani, but I could be wrong.

To reduce mailing list traffic, let's continue this discussion off-line i.e.
by direct mail to each other, and if necessary I will summarize results of
our discussions in the earth documentation.

Regards
Steve

| Message: 76
| Date: Thu, 12 Jun 2008 13:35:35 -0700
| From: Janne Huttunen <jmhuttun at stat.berkeley.edu>
| Subject: [R] Problems with mars in R in the case of nonlinear
| functions
| To:
| Message-ID: <48518897.7080804 at stat.berkeley.edu>
| Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|
| Hi,
|
| I'm trying to use mars function in R to interpolate nonlinear
| multivariate functions.
| However, it seems that mars gives me a fit which uses only very few
| basis function and
| it underfits very badly.
|
| For example, I have tried the following code to test mars:
|
| require("mda")
|
| f <- function(x,y) { x^2-y^2 };
| #f <- function(x,y) { x+2*y };
|
| # Grid
| x <- seq(-1,1,length=10);
| x <- outer(x*0,x,FUN="+"); y <- t(x);
| X <- cbind(as.vector(x),as.vector(y));
|
| # Data
| z <- f(x,y);
|
| fit <- mars(X,as.vector(z),nk=200,penalty=2,thresh=1e-3,degree=2);
|
| # Plotting
| par(mfrow=c(1,2),pty="s")
| lims <- c(min(c(min(z),min(fit$fitted))),max(c(max(z),max(fit$fitted))))
| persp(z=z,ticktype='detailed',col='lightblue',shade=.75,ltheta=50,
|       xlab='x',ylab='y',zlab='z',main='true',phi=25,theta=55,zlim=lims)
|
persp(z=matrix(fit$fitted.values,nrow=nrow(x),byrow=F),ticktype='detailed',
|        col='lightblue',
| xlab='x',ylab='y',zlab='z',shade=.75,ltheta=50,main='MARS',
|        phi=25,theta=55,zlim=lims)
|
| (the code is also here if someone wants to try it:
| http://venda.uku.fi/~jmhuttun/R/marstest.R)
|
| The results are here: http://venda.uku.fi/~jmhuttun/R/R-10.pdf . The
| fitted model contains only
| 5 terms which is not enough in this case. Adjusting parameters like nk,
| thresh, penalty and degree
| seems only have minor effect or no effect at all. It's also strange that
| when I increase
| the number of points in the grid, the results are ever worse:
| see e.g. http://venda.uku.fi/~jmhuttun/R/R-20.pdf for a 20x20 grid.
| However Mars seems to work well with linear functions (e.g. with the
| function which
| is commented in the above code).
|
| Do anyone know what is wrong in this case? Do I miss something is there
| something
| wrong in my code?
|
| This seems not to be a problem with MARS method in general. For example,
| Friedman's MARS implementation (ran in Matlab) gives a rather good fit:
| see http://venda.uku.fi/~jmhuttun/R/Matlab.pdf .
|
| Thank you
|
| Janne
|
| -- 
| Janne Huttunen
| University of California
| Department of Statistics
| 367 Evans Hall Berlekey, CA 94720-3860
| email: jmhuttun at stat.berkeley.edu
| phone: +1-510-502-5205
| office room: 449 Evans Hall



More information about the R-help mailing list