[R] kknn::predict and kknn$fitted.values

Jonathan Henkelman jonathan.henkelman at usask.ca
Sat Aug 29 18:46:24 CEST 2015


In thinking about this 'problem' last night, I found the 'solution'. Any NN
algorithm needs to keep track of all the data it is given, both X and Y
data, otherwise how could it find and report the nearest neighbour! When
predicting (i.e. predict.kknn) it will find the closest match (nearest
neighbour), which, for a point from the original dataset /is that point/!

In contrast, the kknn$fitted.values are derived from some cross validation
approach; likely either finding the nearest point with non-zero distance, or
build a model without that point and see where it falls. Otherwise, it
wouldn't be possible to report the accuracy of the model using only a single
dataset.

I will retest the algorithm using a split training/test dataset to better
understand how predict.kknn selects a model from the suite generated by
train.kknn—my original question. I assume it chooses kknn$best.parameters,
but want to verify this.

Hopefully that clarifies the issue. I post here in case future users have a
similar question. 

Thanks to any who took the time to think about this!
Jonathan



--
View this message in context: http://r.789695.n4.nabble.com/kknn-predict-and-kknn-fitted-values-tp4711625p4711634.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list