[R] R-help Digest, Vol 157, Issue 25

Boris Steipe boris.steipe at utoronto.ca
Thu Mar 24 19:12:31 CET 2016

If the number of values are always the same, the proposed strategies will work for you. If they are not the same, you need a completely different approach. Most importantly, you will need to figure out which columns correspond to missing values. Is it always the last ones that are dropped? If not, then you have a problem because the values will be misaligned and you can't fix that unless you know what values to expect.

Proper imputation depends on the semantics of the data, there are no (sensible) general rules. You need to consider whether the values are missing at random, or whether there is a higher probability for smaller values to be missing etc. That will determine whether you should be imputing from row-averages, column averages, averages over a defined subset - or perhaps better than averages: replacing with random observed values. This _really_ depends on the data and the objectives of your analysis.


On Mar 24, 2016, at 7:16 AM, Burhan ul haq <ulhaqz at gmail.com> wrote:

> Thanks to Boris Steipe, Jim Lemon and  Ivan Calandra for replying.
> I messed up while copying, there are equal number of values for each
> country.
> @ Ivan,
> In case there were different number of values, and we wanted to fill in with
> 1) NA, or
> 2)  "average of the rest of values"
> in the missing values, how would we "impute" such data.
> Thanks again /
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list