[R] Problems with mars in R in the case of nonlinear functions
Stephen Milborrow
milbo at sonic.net
Fri Jun 13 14:34:38 CEST 2008
 I'm trying to use mars function in R to interpolate nonlinear
 multivariate functions.
 However, it seems that mars gives me a fit which uses only very few
 basis function and it underfits very badly.
Try the "earth" package which extends the mars function in the mda package.
Your example becomes
library(earth) # was mda
f < function(x,y) { x^2y^2 }
x < seq(1,1,length=10)
x < outer(x*0,x,FUN="+")
y < t(x)
X < cbind(as.vector(x),as.vector(y))
z < f(x,y)
fit < earth(X, as.vector(z))
summary(fit)
plotmo(fit) # note better fit than before
# your original plotting code could be used too
For this kind of data, you could possibly use the minspan parameter. MARS
by default does not allow every observation to be used as a knot in the
generated basis functions. This strategyy increases resistance to runs of
correlated noise in the data. For nonnoisy data, you can set minspan=1 to
allow MARS to consider
every observation as a potential knot. If your data were noisy then
minspan=1 could overfit the data. With earth, you can use trace=2 to see
the calculated minspan value.
If you run the above example with the earth parameter trace=1, you will see
that the stopping condition for the forward pass is:
Reached delta RSq threshold (DeltaRSq 0.00030214 < 0.001)
To make the forward pass continue further, change the "delta RSq threshold"
by using the thresh parameter:
fit < earth(X, as.vector(z), thresh=1e6)
The resulting model "looks" better when plotted, but note that using thresh
here makes almost no change to the GRSq. That is, with the lower threshold
the model is more complicated (has more terms) but does not have a greater
predictive power. The threshold is just one of the reasons that the forward
pass can terminate (reaching the the maximum number of terms nk is another).
AFAIK Friedman's code (that you ran from Matlab) does not use the threshold
but instead just continues forward stepping until nk is reached. In this
case the Matlab model is arguably more complicated than it need be. I
believe the forward threshhold for MARS was an innovation of Hastie and
Tibshirani, but I could be wrong.
To reduce mailing list traffic, let's continue this discussion offline i.e.
by direct mail to each other, and if necessary I will summarize results of
our discussions in the earth documentation.
Regards
Steve
 Message: 76
 Date: Thu, 12 Jun 2008 13:35:35 0700
 From: Janne Huttunen <jmhuttun at stat.berkeley.edu>
 Subject: [R] Problems with mars in R in the case of nonlinear
 functions
 To:
 MessageID: <48518897.7080804 at stat.berkeley.edu>
 ContentType: text/plain; charset=ISO88591; format=flowed

 Hi,

 I'm trying to use mars function in R to interpolate nonlinear
 multivariate functions.
 However, it seems that mars gives me a fit which uses only very few
 basis function and
 it underfits very badly.

 For example, I have tried the following code to test mars:

 require("mda")

 f < function(x,y) { x^2y^2 };
 #f < function(x,y) { x+2*y };

 # Grid
 x < seq(1,1,length=10);
 x < outer(x*0,x,FUN="+"); y < t(x);
 X < cbind(as.vector(x),as.vector(y));

 # Data
 z < f(x,y);

 fit < mars(X,as.vector(z),nk=200,penalty=2,thresh=1e3,degree=2);

 # Plotting
 par(mfrow=c(1,2),pty="s")
 lims < c(min(c(min(z),min(fit$fitted))),max(c(max(z),max(fit$fitted))))
 persp(z=z,ticktype='detailed',col='lightblue',shade=.75,ltheta=50,
 xlab='x',ylab='y',zlab='z',main='true',phi=25,theta=55,zlim=lims)

persp(z=matrix(fit$fitted.values,nrow=nrow(x),byrow=F),ticktype='detailed',
 col='lightblue',
 xlab='x',ylab='y',zlab='z',shade=.75,ltheta=50,main='MARS',
 phi=25,theta=55,zlim=lims)

 (the code is also here if someone wants to try it:
 http://venda.uku.fi/~jmhuttun/R/marstest.R)

 The results are here: http://venda.uku.fi/~jmhuttun/R/R10.pdf . The
 fitted model contains only
 5 terms which is not enough in this case. Adjusting parameters like nk,
 thresh, penalty and degree
 seems only have minor effect or no effect at all. It's also strange that
 when I increase
 the number of points in the grid, the results are ever worse:
 see e.g. http://venda.uku.fi/~jmhuttun/R/R20.pdf for a 20x20 grid.
 However Mars seems to work well with linear functions (e.g. with the
 function which
 is commented in the above code).

 Do anyone know what is wrong in this case? Do I miss something is there
 something
 wrong in my code?

 This seems not to be a problem with MARS method in general. For example,
 Friedman's MARS implementation (ran in Matlab) gives a rather good fit:
 see http://venda.uku.fi/~jmhuttun/R/Matlab.pdf .

 Thank you

 Janne

 
 Janne Huttunen
 University of California
 Department of Statistics
 367 Evans Hall Berlekey, CA 947203860
 email: jmhuttun at stat.berkeley.edu
 phone: +15105025205
 office room: 449 Evans Hall
More information about the Rhelp
mailing list