[R] Neural Nets (nnet) - evaluating success rate of predictions

Mon May 7 18:38:23 CEST 2007

Folks:

If I understand correctly, the following may be pertinent.

Note that the procedure:

min.nnet = nnet[k] such that error rate of nnet[k] = min[i] {error
rate(nnet(training data) from ith random start) }

does not guarantee a classifier with a lower error rate on **new** data than
any single one of the random starts. That is because you are using the same
training set to choose the model (= nnet parameters) as you are using to
determine the error rate. I know it's tempting to think that choosing the
best among many random starts always gets you a better classifier, but it
need not. The error rate on the training set for any classifier -- be it a
single one or one derived in some way from many -- is a biased estimate of
the true error rate, so that choosing a classifer on this basis does not
assure better performance for future data. In particular, I would guess that
choosing the best among many (hundreds/thousands) random starts is probably
almost guaranteed to produce a poor predictor (ergo the importance of
parsimony/penalization).  I would appreciate comments from anyone, pro or
con, with knowledge and experience of these things, however, as I'm rather
limited on both.

The simple answer to the question of obtaining the error rate using
validation data is: Do whatever you like to choose/fit a classifier on the
training set. **Once you are done,** the estimate of your error rate is the
error rate you get on applying that classifier to the validation set. But
you can do this only once! If you don't like that error rate and go back to
finding a a better predictor in some way, then your validation data have now
been used to derive the classifier and thus has become part of the training
data, so any further assessment of the error rate of a new classifier on it
is now also a biased estimate. You need yet new validation data for that.

Of course, there are all sort of cross validation schemes one can use to
avoid -- or maybe mitigate -- these issues: most books on statistical
classification/machine learning discuss this in detail.

Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of hadley wickham
Sent: Monday, May 07, 2007 5:26 AM
To: Wensui Liu
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Neural Nets (nnet) - evaluating success rate of predictions

Pick the one with the lowest error rate on your training data?
Hadley

On 5/7/07, Wensui Liu <liuwensui at gmail.com> wrote:
> well, how to do you know which ones are the best out of several hundreds?
> I will average all results out of several hundreds.
>
> On 5/7/07, hadley wickham <h.wickham at gmail.com> wrote:
> > On 5/6/07, nathaniel Grey <nathaniel.grey at yahoo.co.uk> wrote:
> > > Hello R-Users,
> > >
> > > I have been using (nnet) by Ripley  to train a neural net on a test
dataset, I have obtained predictions for a validtion dataset using:
> > >
> > > PP<-predict(nnetobject,validationdata)
> > >
> > > Using PP I can find the -2 log likelihood for the validation datset.
> > >
> > > However what I really want to know is how well my nueral net is doing
at classifying my binary output variable. I am new to R and I can't figure
out how you can assess the success rates of predictions.
> > >
> >
> > table(PP, binaryvariable)
> > should get you started.
> >
> > Also if you're using nnet with random starts, I strongly suggest
> > taking the best out of several hundred (or maybe thousand) trials - it
> > makes a big difference!
> >
> > Hadley
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> WenSui Liu
> A lousy statistician who happens to know a little programming
> (http://spaces.msn.com/statcompute/blog)
>

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.