[R] Analyzing Poor Performance Using naiveBayes()

Kirk Fleming kirkrfleming at hotmail.com
Fri Aug 10 21:16:49 CEST 2012


Per your suggestion I ran chi.squared() against my training data and to my
delight, found just 50 parameters that were non-zero influencers. I built
the model through several iterations and found n = 12 to be the optimum for
the training data.

However, results still no so good for the test data. Here are he results for
both with the AUC values for n = 3 to 50, training data in the 0.97 range,
test data in the 0.55 area.

http://r.789695.n4.nabble.com/file/n4639964/Feature_Selection_02.jpg 

If the training and test data sets were not so indistinguishable, I'd assume
something weird about the test data--but I can't tell the two apart using
any descriptive, 'meta' statistics I've tried so far. Having double-checked
for dumb errors and having still obtained the same results, I toasted
everything and started from scratch--still the same performance on the test
data.

Maybe I take a break and reflect for 30 min.



--
View this message in context: http://r.789695.n4.nabble.com/Analyzing-Poor-Performance-Using-naiveBayes-tp4639825p4639964.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list