[R] replacing missing values with row average

Joshua Wiley jwiley.psych at gmail.com
Mon Feb 28 01:14:21 CET 2011


Hi Daniel,

If your data is stored in a matrix, the following should work (and be
fairly efficient):

#############
dat <- matrix(rnorm(100), nrow = 10)
dat[sample(1:10, 3), sample(1:10, 3)] <- NA
## create an index of missing values
index <- which(is.na(dat), arr.ind = TRUE)
## calculate the row means and "duplicate" them to assign to appropriate cells
dat[index] <- rowMeans(dat, na.rm = TRUE)[index[, "row"]]

## for documentation see
?which # particularly the arr.ind argument
?"[" # for extraction or selecting a subset to overwrite
#############

the only reason this does not work as is with data frames is because
of how they are indexed/subset.  dat[index] does not work.  The
required modification is probably fairly minimal, but if you are happy
to use a matrix, then its a moot issue.

HTH,

Josh

On Sun, Feb 27, 2011 at 3:25 PM, Daniel M. <danielmessay at yahoo.com> wrote:
> Hello,
>
> I have some dataset, which i read it from external file using the (data <-
> read.csv("my file location")) and read as a dataframe
>
>> is(data)
> [1] "data.frame" "list"       "oldClass"   "vector"
> but i have also converted this into a matrix and tried to apply my code but
> didnt work.
>
> Anyways, suppose i have the following data.
>
>
>    data <- as.data.frame(matrix(rnorm(100), nrow = 10))
>
> And let's put some missing values
>
>    data[sample(1:10, 3), sample(1:10, 3)] <- NA
>
> I want to replace all NA's by row averages or column averages of my matrix.
>
> I tried to use(with my original data matrix)
>
>    data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
> But got an error message of
>
>       Error in rowMeans(data, na.rm = TRUE) : 'x' must be numeric
> Then I converted  data<- as.matrix(data)
>                  data<- as.numeric(data)
> And applying my code
>
>     data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
>
> Error message
>
>
>      Error in rowMeans(data, na.rm = TRUE) :
>  'x' must be an array of at least two dimensions
>
> Then again i tried to convert it into Arrays....but the errors continues....
>
> I Also tried the code
>
>    data[is.na(data)] <- apply(data,1,mean)
>
> But still didnt work out.
>
> Can anyone pls help me as to how to fix it and get out of this, please?
>
> Thank you very much
>
> Daniel
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list