[R] Mathematical working procedure of imputation methods (medianImpute, knnImpute, and bagImpute) in caret package R

Richard O'Keefe r@oknz @end|ng |rom gm@||@com
Wed Sep 21 03:53:50 CEST 2022


?preProcess
     k-nearest neighbor imputation is carried out by finding the k
     closest samples (Euclidian distance) in the training set.
     Imputation via bagging fits a bagged tree model for each predictor
     (as a function of all the others). This method is simple, accurate
     and accepts missing values, but it has much higher computational
     cost. Imputation via medians takes the median of each predictor in
     the training set, and uses them to fill missing values. This
     method is simple, fast, and accepts missing values, but treats
     each predictor independently, and may be inaccurate.
...
References:

     <http://topepo.github.io/caret/pre-processing.html>

     Kuhn and Johnson (2013), Applied Predictive Modeling, Springer,
     New York (chapter 4)

     Kuhn (2008), Building predictive models in R using the caret
     (doi:10.18637/jss.v028.i05
     <https://doi.org/10.18637/jss.v028.i05>)

There are more references, but you really should read Kuhn (2008).

It's not clear what kind of understanding you need.
How the methods work?  The description above TELLS you what they do.
How WELL the methods work?  Again the description above is pretty
clear.  It says such and such is fast and so and so "has much higher
computational cost", which is surely what you want to know for large
amounts of data?  How fast the methods will be on your machine with
your data can only be determined by benchmarking, and you do not
need the internals for that.

All of this is open source so you can easily find the internals for
yourself if you really want to.  If nothing else, it's at
https://github.com/topepo/caret



On Wed, 21 Sept 2022 at 09:20, K Purna Prakash <prakash.nani using gmail.com>
wrote:

> Dear Sir/Madam,
> Greetings!!!
>
> Kindly provide the detailed internal mathematical working mechanism of the
> following median, KNN, and bagging imputation methods available in caret
> package R.
>
>  preProcess(train_data, method = "medianImpute")
>  preProcess(train_data, method = "knnnImpute")
>  preProcess(train_data method = "bagImpute")
>
> The details provided by you will help me a lot for a better understanding
> of these imputation methods especially while dealing with large sets of
> data.
>
> I will look forward to hearing from you.
>
> Thanks and regards,
> K. Purna Prakash.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list