[R] multidimensional function fitting

Mon Mar 3 19:57:02 CET 2003

Have you considered fitting a neural net using the nnet package?  Since
nets are determined by a functional form that is straightforward
(recursive, but straightforward....) and a set of coefficients (weights),
they're fairly easy to program from scratch.  There are no "exotic" (to
some) basis functions to deal with.  You can find descriptions sufficient
to guide coding in Venables and Ripley's _Modern Applied Statistics with
S-Plus_, Ripley's _Pattern Recognition and Neural Networks_, and also _The
Elements of Statistical Learning_ by Hastie, Tibshirani, and Friedman.
I've found each of these to be a good general reference, by the way--if
you're going to be dealing with "curve fitting" on more than this one
occasion, one or all of these would be good to have.  Ripley's pattern
recognition book naturally focuses on classification rather than general
curve-fitting, though many ideas are common to both.

Fitting a net will put more of a burden on you to optimize the model than
mgcv will.  I would suggest  the following guidelines (most of which I
learned from Ripley's pattern recognition book):
   Any net "fitting" should actually consist of multiple fits from random
   starting points--take the fit that minimizes the fitting criterion.
   Scale predictors if necessary to ensure that predictors are on similar
   scales.
   For curve fitting, use the options "linout = T" and "skip = T" (the
   latter is optional but recommended).
   You must optimize some criterion that keeps you from overfitting.  Two
   different techniques I have used successfully are
   (a)  choose the number of hidden units and weight decay parameter to
   optimize cross-validation performance, and
   (b)  choose the number of hidden units to optimize the Bayesian
   Information Criterion (BIC) (setting weight decay to zero).
   The former should give you better performance, but the latter is less
   work and usually works pretty well.  BIC penalizes the number of
   parameters, and here the "number of parameters" is taken to be the
   number of weights.  I could send you R code that automates this.
   Using weight decay > 0 generally improves predictive performance.  Once
   you have enough hidden units, you will not overfit if you use a suitable
   weight-decay parameter.  In fact, a reasonable strategy would be to use
   BIC to select the number of hidden units, then add (say) 2 hidden units
   and find weight-decay that optimizes cross-validation performance.

In short, mgcv makes model optimization convenient but has a price in
implementation, while neural nets make implementation convenient but have a
price in optimization.  I sometimes fit a model which is then implemented
by our Software Engineering group in C, and in that situation I prefer to
pay the price that falls only on me and which I can afford to pay.  Hence I
have used neural nets.  In fact early on I specified for our software group
a "neural net evaluator" C routine that takes some net architectural
parameters (number of inputs, hidden nodes, etc.) and weights in the order
that the nnet package displays them.  Now I can essentially fit a net with
package nnet, send someone the parameters, and they'll have it running in C
in short order.

Of course, GAM's also offer much in the way of model interpretation, and if
the phenomenon at hand obeys additivity, a GAM will outperform a neural
net.  I should add that I use the mgcv package frequently, and like it very
much.

Regarding computational resources, with some datasets I've fitted complex
GAM's, neural nets, and projection pursuit (ppr in package modreg), and
didn't notice much difference in a very rough, qualitative sense.  With
enough data, they all take a while.  I didn't investigate memory usage.  Of
course, gam and ppr are doing more work--gam is optimizing smoothness, and
ppr is fitting a range of models.  If you have so much data that none of
these are feasible,  you could do worse than fitting to a random subset.
How much precision do you need anyway?  And how complex a model do you
anticipate?  1000 cases or fewer is probably more than enough for most
regression problems.  Set aside the rest, and you have a large validation
set!  In fact you might fit multiple models, each to a random subset,
implement all the models, and average their responses.

Further along these lines, you could fit quite a few regression trees using
rpart because it's very fast.  You could even do thorough cross-validation
without great demands in time or memory.  A tree won't offer a smooth
response function, but if you average enough of them, the granularity of
response should be small.  Trees, like nets, are easy to code from scratch.
This just about describes bagging, except that the multiple subsets in
bagging are generated by bootstrap sampling, so are not mutually exclusive,
can contain cases more than once, and are of the same size as the original
data.

Good luck,

Jim Garrett
Becton Dickinson Diagnostic Systems
Baltimore, Maryland, USA

--__--__--

Message: 37
Date: Fri, 28 Feb 2003 17:38:52 -0500
From: "Huntsinger, Rei
d" <reid_huntsinger at merck.com>
Subject: RE: [despammed] RE: [R] multidimensional function fitting
To: "'RenE J.V. Bertin'" <rjvbertin at despammed.com>,
   "Wiener, Matthew" <matthew_wiener at merck.com>
cc: r-help at stat.math.ethz.ch

You can use R objects, such as the return from gam, and the
predict.gam function, from C. See the R extensions manual.

Reid Huntsinger

-----Original Message-----
From: RenE J.V. Bertin [mailto:rjvbertin at despammed.com]
Sent: Thursday, February 27, 2003 3:42 PM
To: Wiener, Matthew
Cc: r-help at stat.math.ethz.ch
Subject: Re: [despammed] RE: [R] multidimensional function fitting

On Thu, 27 Feb 2003 13:52:50 -0500, "Wiener, Matthew"
<matthew_wiener at merck.com> wrote regarding
"[despammed] RE: [R] multidimensional function fitting"

8-) Take a look at package mgcv.  Hope this helps.  --Matt
8-)

Thank you, I just did. It may indeed be what I'm looking for (I haven't
quite understood everything about it...), but:

1) The best fits I obtain with a formula like z~s(x,y) ; but this I cannot
possibly transport into the C programme where I need it! Maybe I wasn't
clear on this aspect?

2) It is very memory hungry, esp. when using the s() function: I have 192Mb
with 256Mb swap (not a lot, but reasonable I'd say), and I've never had to
kill R as often as when trying gam()...

R.B.

______________________________________________
R-help at stat.math.ethz.ch mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help

------------------------------------------------------------------------------

**********************************************************************
This message is intended only for the designated recipient(s).  ... [[dropped]]