# [R] calculate classification error in test data set if training data set is not the same length

Melanie Zoelck mzoelck at gmx.com
Sun Feb 17 16:50:56 CET 2013

```Dear R-Help Members,

I have built a classification function using a baseline data set, that contains the group variable and have used it to classify the test data set. I am now trying to get the classification table for the training and test data set and classification success using:

baseline.lda<-lda(Stock ~ LTT + LF + LFM + LPO + LH + LPV + LPA + LD + LA + DAC + HH + HP + ML + OD + TV02 + TV03 + TV04 + TV05 + TH01 + TH06 + TH07 + TH08 + TD02 + TD03 + TD05 + TD06 + TD08 + WGHT + WDTH + PERI + CIRC + A02 + A03 + A04 + A05 + A08 + A12 + A14 + A15 + A16 + A17 + A18 + A26 + A27 + A30 + B02 + B03 + B08 + B09 + B10 + B11 + B14 + B15 + B20 + B22 + B24 + B25 + B26 + B29 + B30 + C02 + C03 + C05 + C06 + C08 + C09 + C11 + C12 + C13 + C14 + C16 + C18 + C19 + C23 + C25 + C28 + C29 + D01 + D02 + D03 + D04 + D06 + D07 + D08 + D10 + D14 + D15 + D18 + D19 + D20 + D23,data=baseline.data.scaled)

ypred.train <- predict(baseline.lda,baseline.data.scaled)\$class
ypred.test <- predict(baseline.lda,mixed.data.scaled)\$class

# Training error
table(ypred.train , baseline.data.scaled\$Stock)
mean(ypred.train == baseline.data.scaled\$Stock)
# Test error
table(ypred.test , baseline.data.scaled\$Stock)
mean(ypred.test == baseline.data.scaled\$Stock)

This works for the training set, but not the test data set, as the baseline data set is the only one that contains the grouping variable, but is not the same length as the test data set (1161 samples in the training set, 236 in the test set). Is there a way to construct a classification table from the test data predictions and get the classification error?

Thank you!

Melanie Zoelck
____________________________________________
Melanie Zölck (Zoelck)
PhD Candidate
Galway-Mayo Institute of Technology
Marine and Freshwater Research Centre
Commercial Fisheries Research Group
Department of Life Science