[R] Neural Nets (nnet) - evaluating success rate of predictions

Thu May 10 11:51:50 CEST 2007

All,

As an addition to my earlier posting, I've now implemented the PRE
measures of prediction accuracy suggested by Menard (1995) as an R
function, which is not a lengthy one and is thus attached below.

With respect to the P-values one has an option in testing for either
1) significantly better prediction results or 2) significantly
different (better or worse) results, so one can/should adjust the
interpretation of the standardized d-value in the code accordingly. In
the former case one should use the one-tailed value, and in the later
case the two-tailed value.

	-Antti Arppe

# Formulas for assessing prediction efficiency
#
# (C) Antti Arppe 2007
#
# Observations by rows, predictions by columns
#
# All formulas according to to the following reference:
#
# Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage
# University Paper Series on Quantitative Applications in the Social
# Sciences 07-106. Sage Publications, Thousand Oaks, California.

model.prediction.efficiency <- function(dat)
{ N <- sum(dat);
  # observed as row margins, predicted as column margins
  # according to Menard (1995: 24-32)
  sum.row <- apply(dat,1,sum);
  sum.col <- apply(dat,2,sum);
  correct.with.model <- sum(diag(dat));
  errors.with.model <- N - correct.with.model;
  errors.without.model.prediction <- N - max(sum.row);
  errors.without.model.classification <- sum(sum.row*((N-sum.row)/N));
  lambda.p <- 1-(errors.with.model/errors.without.model.prediction);
  d.lambda.p <- (errors.without.model.prediction/N-errors.with.model/N)/sqrt((errors.without.model.prediction/N)*(1-errors.without.model.prediction/N)/N);
  p.lambda.p <- 1-pnorm(d.lambda.p);
  tau.p <-  1-(errors.with.model/errors.without.model.classification);
  d.tau.p <- (errors.without.model.classification/N-errors.with.model/N)/sqrt((errors.without.model.classification/N)*(1-errors.without.model.classification/N)/N);
  p.tau.p <- 1-pnorm(d.tau.p);
  return(lambda.p, tau.p, d.lambda.p, d.tau.p, p.lambda.p, p.tau.p);
}

----- Original Message ----
From: Antti Arppe <aarppe at ling.helsinki.fi>
To: r-help at stat.math.ethz.ch
Cc: Antti Arppe <aarppe at ling.helsinki.fi>
Sent: Tuesday, 8 May, 2007 12:36:20 PM
Subject: Re: [R] Neural Nets (nnet) - evaluating success rate of predictions

On Mon, 7 May 2007, r-help-request at stat.math.ethz.ch wrote:
> Date: Sun, 6 May 2007 12:02:31 +0000 (GMT)
> From: nathaniel Grey <nathaniel.grey at yahoo.co.uk>
>
> However what I really want to know is how well my nueral net is
> doing at classifying my binary output variable. I am new to R and I
> can't figure out how you can assess the success rates of
> predictions.

I've been recently tacking this myself, though with respect to
polytomous (>2) outcomes. The following approaches are based on
Menard (1995), Cohen et al. (2002) and Manning & Schütze (1999).

First you have to decide what is the critical probability that you use
to classify the cases into class A (and consequently not(class[A])).
The simplest level is 0.5, but other levels might also be motivated,
see e.g. Cohen et al. (2002: 516-519).

You can then treat the classification task as two distinct types,
namely classification and prediction models, which have an effect on
how the efficiency and accuracy of prediction is exactly measured
(Menard 1995: 24-26). In a pure prediction model, we set no a priori
expectation or constraint on the overall frequencies of the predicted
classes. To the contrary, in a classification model our expectation is
that the predicted outcome classes on the long run will end up having
the same proportions as are evident in the training data.

As the starting point for evaluating prediction efficiency is to
compile a 2x2 prediction/classification table. Frequency counts on the
(decending) diagonal in the table indicate correctly predicted and
classified cases, whereas all counts off the diagonal are incorrect.
For the two alternatives overerall, we can divide the predicted
classifications into the four types presented below, on which the
basic measures of prediction efficiency are based. (Manning and
Schütze 1999: 267-271)

Original/Predicted    Class             not(Class)(=Other)
Class            TP ~ True Positive)    FN ~ False Negative
not(Class) (=Other)    FP ~ False Positive     TN ~ True Negative

You can then go on to calculate recall and precision, or spesificity
or sensitivity. Recall is the proportion of original occurrences of
some particular class for which the prediction is correct (formula 1
below, see Manning and Schütze 1999: 269, formula 8.4), whereas
precision is the proportion of the all the predictions of some
particular class, which turn out to be correct (formula 2 below, see
Manning and Schütze 1999: 268, formula 8.3). Sensitivity is in fact
exactly equal to recall, whereas specificity is understood as the
proportion of non-cases correctly predicted or classified as
non-cases, i.e. rejected (formula 3 below) Furthermore, there is a
third pair of evaluation measures that one could also calculate,
namely accuracy and error (formula 4 below) (Manning and Schütze 1999:
268-270).

(1) Recall = TP / (TP + FN) (=Sensitivity)

(2) Precision = TP / (TP + FP)

(3) Specificity = TN / (TN + FN)

(4) Accuracy = (TP + TN) / N = diag(n[k,k])

However, as has been noted in some earlier responses these
aforementioned general measures do not in any way take into
consideration whether prediction and classification according to a
model, with the help of explanatory variables, performs any better
than knowing the overall proportions of the outcome classes.
For this purpose, the asymmetric summary measures of association based
on Proportionate Reduction of Error (PRE) are good candidates for
evaluating prediction accuracy, where we expect that the prediction or
classification process on the basis of the models should exceed some
baselines or thresholds. However, one cannot use the Goodman-Kruskal
lambda and tau as such, but make some adjustments to account for the
possibility of incorrect prediction.

With this approach one compares prediction/classification errors with
the model, error(model), to the baseline level of
prediction/classification errors without the error(model, baseline),
according to formula 10 below. (Menard 1995: 28-30). The formula for
the error with the model remains the same, irrespective of whether we
are evaluating prediction or classification accuracy, presented in
(5), but the errors without the model vary according to the intended
objective, presented in (6) and (7). Subsequently, the measure for the
proportionate reduction of prediction error is presented in (9) below,
and being analogous to the Goodman-Kruskal lambda it is designated as
lambda(prediction). Similarly, the measure for proportionate reduction
of classification error is presented in (10), and being analogous with
the Goodman-Kruskal tau it is likewise designated as
tau(classification). For both measures, positive values indicate
better than baseline classification, while negative values worse
performance.

(5)  error(model) = N - SUM{k=1...K}n[k,k] = N - SUM{diag(n)],
where n is the 2x2 prediction/classification matrix

(6)  error(baseline, prediction) = N - max(R[k]),
with R[k] = marginal row sums for each row k of altogether K classes
and N the sum total of cases.

(7)  error(baseline, classification) = SUM{k=1...K}(R[k]·((N-R[k])/N)

with R[k] = marginal row sums for each row k of altogether K classes
and N the sum total of cases.

(8) PRE = error(baseline)-error(model))/error(baseline,pred.|class.)

(9) lambda(prediction) = 1-error(model) / error(baseline,prediction)

(10) tau(classification) = 1-error(model)/ error(baseline,classification)

REFERENCES:

Cohen, Jacob, Cohen Patricia, West, Stephen G. and Leona S. Aiken.
2003. Applied Multiple Regression/Correlation Analysis for the
Behavioral Sciences (3rd edition). Lawrence Erlbaum Associates,
Mahwah, New Jersey.

Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage
University Paper Series on Quantitative Applications in the Social
Sciences 07-106. Sage Publications, Thousand Oaks, California.

Manning, Christopher D., and Hinrich Schütze. 1999. Foundations of
statistical natural language processing." Cambridge, Massachusetts:
MIT Press.
-----