[R] statistical significance of accuracy increase in classification

Monica Pisica pisicandru at hotmail.com
Tue Feb 24 17:22:41 CET 2009


Hi everyone,

I would like to test for the statistical significance(for what it worth ...) in increasing classification accuracy and kappa statistics from different land classifications. The classifications were done using other software (like eCognition and See5), but the results were "sampled" at locations where i have the "reference" class known. So using package "caret" i did the confusion matrix. For now i am interested in the overall results which give the overall classification accuracy and kappa statistics among others. Depending which classification i test, i have some small increase inaccuracy and a little larger increase in kappa statistics. I wonder if there is a way to do a statistical significance test for the accuracy and kappa increase between the 2 classifications.

Data example and some code:

library(caret)
 
ref <- c(15, 13, 13, 13, 13, 15, 14, 14, 14, 15, 13, 13, 13, 15, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 14, 13, 13, 13, 13, 13, 13, 13, 15, 13, 13, 15, 13, 15, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,13, 14, 13, 13, 13, 13, 13, 14, 14, 15, 15, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13, 13,13, 13, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13, 13, 14, 13, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13)

class1 <- c(14, 14, 13, 13, 13, 15, 13, 14, 15, 14, 14, 13, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 13, 14, 13, 13, 14, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 15, 21, 13, 15, 13, 21, 13, 13, 14, 13, 15, 13, 15, 13, 13, 14, 13, 13, 13, 13, 13, 13, 13,13, 14, 14, 13, 13, 13, 13, 15, 15, 15, 15, 13, 13, 13, 13, 13, 5, 13, 15, 13, 13, 13, 13, 13, 13,15, 13, 15, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)

class2 <- c(14, 15, 13, 13, 13, 15, 13, 14, 15, 15, 14, 13, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 13, 14, 13, 13, 15, 13, 13, 15, 14, 13, 13, 13, 13, 13, 13,13, 13, 15, 13, 13, 15, 13, 21, 13, 13, 13, 13, 15, 13, 15, 15, 13, 14, 13, 13, 13, 13, 13, 13, 15,13, 14, 14, 13, 13, 13, 13, 15, 14, 15, 15, 13, 14, 13, 13, 13, 15, 13, 15, 13, 13, 13, 13, 13, 13,15, 13, 15, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)

ref1 <- factor(ref, levels = c(5, 13, 14, 15, 21, 22))
pred1 <- factor(class1, levels = c(5, 13, 14, 15, 21, 22))
pred2 <- factor(class2, levels = c(5, 13, 14, 15, 21, 22))

t1 <- table(pred1, ref1)
t2 <- table(pred2, ref1)

cm1 <- confusionMatrix(t1)
cm1$overall

cm2 <- confusionMatrix(t2)
cm2$overall

As you see the increase in accuracy is very small, but the increase in kappa is a little bit more substantial. Is this increase statistical significant?

Thanks for any help,
 
Monica
_________________________________________________________________


owitworks_022009



More information about the R-help mailing list