[R] missing and replace

Fraser D. Neiman fneiman at monticello.org
Thu Apr 27 13:20:33 CEST 2017


Dear All,

Replacing  missing values with means is generally not a good idea:

"Perhaps the easiest way to impute is to replace each missing
value with the mean of the observed values for that variable. Unfortunately, this
strategy can severely distort the distribution for this variable, leading to complications
with summary measures including, notably, underestimates of the standard
deviation. Moreover, mean imputation distorts relationships between variables by
“pulling” estimates of the correlation toward zero."

That's from Gelman and Hill -- more here : http://www.stat.columbia.edu/~gelman/arm/missing.pdf


best, Fraser

________________________________________
From: Val [valkremk at gmail.com]
Sent: Wednesday, April 26, 2017 8:45 PM
To: r-help at R-project.org (r-help at r-project.org)
Subject: [R] missing and replace

HI all,

I have a data frame with three variables. Some of the variables do
have missing values and I want to replace those missing values
(1represented by NA) with the mean value of that variable. In this
sample data,  variable z and y do have missing values. The mean value
of y  and z are152. 25  and 359.5, respectively . I want replace those
missing values  by the respective mean value ( rounded to the nearest
whole number).

DF1 <- read.table(header=TRUE, text='ID1 x y z
1  25  122    352
2  30  135    376
3  40   NA    350
4  26  157    NA
5  60  195    360')
mean x= 36.2
mean y=152.25
mean z= 359.5

output
ID1  x  y  z
1   25 122   352
2   30 135   376
3   40 152   350
4   26 157   360
5   60 195   360


Thank you in advance

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list