[R] replace Na values with the mean of the column which contains them

arun smartpink111 at yahoo.com
Mon Jul 29 19:57:49 CEST 2013


Hi,

de<- structure(c(NA, NA, NA, NA, NA, NA, NA, NA, 0.27500571, -3.07568579, 
-0.42240954, -0.26901731, 0.01766284, -0.8099958, 0.20805934, 
0.03036708, -0.26928087, 1.20925752, 0.38012008, -0.41778861, 
-0.49677462, -0.13248754, -0.54179054, 0.35788624, -0.41467591, 
-0.59234248, 0.73642396, -0.06768044, -0.40321968, -1.52283305, 
0.25974308, -0.0401373, -0.1192078, 0.9325334, -1.8927164, 1.4330507, 
0.2892706, 1.3976522, 0.2295291, -0.5009389, -0.342656, -0.8439027, 
-0.4971999, -1.6127122, -0.6508823, 1.4729576, -1.6093478, 0.1686006
), .Dim = c(16L, 3L))


Your code should be:
sapply(seq_len(ncol(de)),function(i) {de[,i][is.na(de[,i])]<-mean(de[,i],na.rm=TRUE);de[,i]})
A.K.




Hi everyone 

I have a problem with replacing the NA values with the mean of 
the column which contains them. If I replace Na with the means of the 
rest values in the column, the mean of the whole column will be still 
the same as if I would have omitted NA values. I have the following data 

de 
     [,1]        [,2]       [,3] 
 [1,]          NA -0.26928087 -0.1192078 
 [2,]          NA  1.20925752  0.9325334 
 [3,]          NA  0.38012008 -1.8927164 
 [4,]          NA -0.41778861  1.4330507 
 [5,]          NA -0.49677462  0.2892706 
 [6,]          NA -0.13248754  1.3976522 
 [7,]          NA -0.54179054  0.2295291 
 [8,]          NA  0.35788624 -0.5009389 
 [9,]  0.27500571 -0.41467591 -0.3426560 
[10,] -3.07568579 -0.59234248 -0.8439027 
[11,] -0.42240954  0.73642396 -0.4971999 
[12,] -0.26901731 -0.06768044 -1.6127122 
[13,]  0.01766284 -0.40321968 -0.6508823 
[14,] -0.80999580 -1.52283305  1.4729576 
[15,]  0.20805934  0.25974308 -1.6093478 
[16,]  0.03036708 -0.04013730  0.1686006 

and I wrote the code 
de[which(is.na(de))]<-sapply(seq_len(ncol(de)),function(i) {mean(de[,i],na.rm=TRUE)}) 

I get as the result 
   [,1]        [,2]       [,3] 
 [1,] -0.50575168 -0.26928087 -0.1192078 
 [2,] -0.12222376  1.20925752  0.9325334 
 [3,] -0.13412312  0.38012008 -1.8927164 
 [4,] -0.50575168 -0.41778861  1.4330507 
 [5,] -0.12222376 -0.49677462  0.2892706 
 [6,] -0.13412312 -0.13248754  1.3976522 
 [7,] -0.50575168 -0.54179054  0.2295291 
 [8,] -0.12222376  0.35788624 -0.5009389 
 [9,]  0.27500571 -0.41467591 -0.3426560 
[10,] -3.07568579 -0.59234248 -0.8439027 
[11,] -0.42240954  0.73642396 -0.4971999 
[12,] -0.26901731 -0.06768044 -1.6127122 
[13,]  0.01766284 -0.40321968 -0.6508823 
[14,] -0.80999580 -1.52283305  1.4729576 
[15,]  0.20805934  0.25974308 -1.6093478 
[16,]  0.03036708 -0.04013730  0.1686006 

It has replaced the NA values in first column with mean of first
 column -0.505... and second cell with mean of second column etc. 
I want to have the result like this: 
[,1]        [,2]       [,3] 
 [1,] -0.50575168 -0.26928087 -0.1192078 
 [2,] -0.50575168  1.20925752  0.9325334 
 [3,] -0.50575168  0.38012008 -1.8927164 
 [4,] -0.50575168 -0.41778861  1.4330507 
 [5,] -0.50575168 -0.49677462  0.2892706 
 [6,] -0.50575168 -0.13248754  1.3976522 
 [7,] -0.50575168 -0.54179054  0.2295291 
 [8,] -0.50575168  0.35788624 -0.5009389 
 [9,]  0.27500571 -0.41467591 -0.3426560 
[10,] -3.07568579 -0.59234248 -0.8439027 
[11,] -0.42240954  0.73642396 -0.4971999 
[12,] -0.26901731 -0.06768044 -1.6127122 
[13,]  0.01766284 -0.40321968 -0.6508823 
[14,] -0.80999580 -1.52283305  1.4729576 
[15,]  0.20805934  0.25974308 -1.6093478 
[16,]  0.03036708 -0.04013730  0.1686006 

Thanks in advance



More information about the R-help mailing list