[R] accuracy of a neural net

Max Kuhn mxkuhn at gmail.com
Sun May 24 16:22:47 CEST 2009

You might want to use cross-validation or the bootstrap to get error
estimates. Also, you should include the PCA step in the resampling
since it does add noise to the model.

Look at the pcaNNet and train functions in the caret package.

Also your code for the nnet would imply that you are predicting a
continuous outcome (i.e. linear function on the hidden units), so a
confusion matrix wouldn't be appropriate.


On Sun, May 24, 2009 at 7:28 AM, onyourmark <william108 at gmail.com> wrote:
> Hi. I started with a file which was a sparse 982x923 matrix and where the
> last column was a variable to be predicted. I did principle component
> analysis on it and arrived at a new 982x923 matrix.
> Then I ran the code below to get a neural network using nnet and then wanted
> to get a confusion matrix or at least know how accurate the neural net was.
> I used the first 22 principle components only for the inputs for the neural
> net.
> I got a perfect prediction rate which is somewhat suspect ( I was using the
> same data for training and prediction but I did not expect perfect
> prediction anyway). So I tried using only a sample of records to build the
> neural net.
> Even with this sample I got 980 out of 982 correct. Can anyone spot an error
> here?
> crs$dataset <- read.csv("file:///C:/dataForR/textsTweet1/cleanForPC.csv",
> na.strings=c(".", "NA", "", "?"))
> crs$nnet <- nnet(Value ~ ., data=crs$dataset[,c(1:22,922)], size=10,
> linout=TRUE, skip=TRUE, trace=FALSE, maxit=1000)
> targets=crs$dataset[,922]
> rawpredictions =predict(crs$nnet, crs$dataset[, c(1:22)], type="raw")
> roundedpredictions=round(rawpredictions[,1],digits = 0)
> trueAndPredicted=cbind(roundedpredictions, targets)
> howManyEqual=trueAndPredicted[,1]==trueAndPredicted[,2]
> sum(howManyEqual)
> samp <- c(sample(1:50,25), sample(51:100,25), sample(101:150,25))
> samp <- c(sample(1:250,125), sample(251:500,125), sample(500:920,300))
> crs$nnet <- nnet(Value ~ ., data=crs$dataset[samp,c(1:22,922)], size=10,
> linout=TRUE, skip=TRUE, trace=FALSE, maxit=1000)



More information about the R-help mailing list