[R] Neural Nets (nnet) - evaluating success rate of predictions
Antti Arppe
aarppe at ling.helsinki.fi
Tue May 8 13:36:20 CEST 2007
Nathaniel,
On Mon, 7 May 2007, r-help-request at stat.math.ethz.ch wrote:
> Date: Sun, 6 May 2007 12:02:31 +0000 (GMT)
> From: nathaniel Grey <nathaniel.grey at yahoo.co.uk>
>
> However what I really want to know is how well my nueral net is
> doing at classifying my binary output variable. I am new to R and I
> can't figure out how you can assess the success rates of
> predictions.
I've been recently tacking this myself, though with respect to
polytomous (>2) outcomes. The following approaches are based on
Menard (1995), Cohen et al. (2002) and Manning & Schütze (1999).
First you have to decide what is the critical probability that you use
to classify the cases into class A (and consequently not(class[A])).
The simplest level is 0.5, but other levels might also be motivated,
see e.g. Cohen et al. (2002: 516-519).
You can then treat the classification task as two distinct types,
namely classification and prediction models, which have an effect on
how the efficiency and accuracy of prediction is exactly measured
(Menard 1995: 24-26). In a pure prediction model, we set no a priori
expectation or constraint on the overall frequencies of the predicted
classes. To the contrary, in a classification model our expectation is
that the predicted outcome classes on the long run will end up having
the same proportions as are evident in the training data.
As the starting point for evaluating prediction efficiency is to
compile a 2x2 prediction/classification table. Frequency counts on the
(decending) diagonal in the table indicate correctly predicted and
classified cases, whereas all counts off the diagonal are incorrect.
For the two alternatives overerall, we can divide the predicted
classifications into the four types presented below, on which the
basic measures of prediction efficiency are based. (Manning and
Schütze 1999: 267-271)
Original/Predicted Class not(Class)(=Other)
Class TP ~ True Positive) FN ~ False Negative
not(Class) (=Other) FP ~ False Positive TN ~ True Negative
You can then go on to calculate recall and precision, or spesificity
or sensitivity. Recall is the proportion of original occurrences of
some particular class for which the prediction is correct (formula 1
below, see Manning and Schütze 1999: 269, formula 8.4), whereas
precision is the proportion of the all the predictions of some
particular class, which turn out to be correct (formula 2 below, see
Manning and Schütze 1999: 268, formula 8.3). Sensitivity is in fact
exactly equal to recall, whereas specificity is understood as the
proportion of non-cases correctly predicted or classified as
non-cases, i.e. rejected (formula 3 below) Furthermore, there is a
third pair of evaluation measures that one could also calculate,
namely accuracy and error (formula 4 below) (Manning and Schütze 1999:
268-270).
(1) Recall = TP / (TP + FN) (=Sensitivity)
(2) Precision = TP / (TP + FP)
(3) Specificity = TN / (TN + FN)
(4) Accuracy = (TP + TN) / N = diag(n[k,k])
However, as has been noted in some earlier responses these
aforementioned general measures do not in any way take into
consideration whether prediction and classification according to a
model, with the help of explanatory variables, performs any better
than knowing the overall proportions of the outcome classes.
For this purpose, the asymmetric summary measures of association based
on Proportionate Reduction of Error (PRE) are good candidates for
evaluating prediction accuracy, where we expect that the prediction or
classification process on the basis of the models should exceed some
baselines or thresholds. However, one cannot use the Goodman-Kruskal
lambda and tau as such, but make some adjustments to account for the
possibility of incorrect prediction.
With this approach one compares prediction/classification errors with
the model, error(model), to the baseline level of
prediction/classification errors without the error(model, baseline),
according to formula 10 below. (Menard 1995: 28-30). The formula for
the error with the model remains the same, irrespective of whether we
are evaluating prediction or classification accuracy, presented in
(5), but the errors without the model vary according to the intended
objective, presented in (6) and (7). Subsequently, the measure for the
proportionate reduction of prediction error is presented in (9) below,
and being analogous to the Goodman-Kruskal lambda it is designated as
lambda(prediction). Similarly, the measure for proportionate reduction
of classification error is presented in (10), and being analogous with
the Goodman-Kruskal tau it is likewise designated as
tau(classification). For both measures, positive values indicate
better than baseline classification, while negative values worse
performance.
(5) error(model) = N - SUM{k=1...K}n[k,k] = N - SUM{diag(n)],
where n is the 2x2 prediction/classification matrix
(6) error(baseline, prediction) = N - max(R[k]),
with R[k] = marginal row sums for each row k of altogether K classes
and N the sum total of cases.
(7) error(baseline, classification) = SUM{k=1...K}(R[k]·((N-R[k])/N)
with R[k] = marginal row sums for each row k of altogether K classes
and N the sum total of cases.
(8) PRE = error(baseline)-error(model))/error(baseline,pred.|class.)
(9) lambda(prediction) = 1-error(model) / error(baseline,prediction)
(10) tau(classification) = 1-error(model)/ error(baseline,classification)
REFERENCES:
Cohen, Jacob, Cohen Patricia, West, Stephen G. and Leona S. Aiken.
2003. Applied Multiple Regression/Correlation Analysis for the
Behavioral Sciences (3rd edition). Lawrence Erlbaum Associates,
Mahwah, New Jersey.
Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage
University Paper Series on Quantitative Applications in the Social
Sciences 07-106. Sage Publications, Thousand Oaks, California.
Manning, Christopher D., and Hinrich Schütze. 1999. Foundations of
statistical natural language processing." Cambridge, Massachusetts:
MIT Press.
More information about the R-help
mailing list