[R] How to replace missing values by mean of subgroup of a group

Boris Steipe boris.steipe at utoronto.ca
Tue May 9 22:20:13 CEST 2017


Pedestrian code, so you can analyze this easily. However entirely untested since I have no ambitionto recreate your input data as a data frame. This code assumes:
 - your data _is_ a data frame
 - the desired column is called food.price, not "food price" (cf. ?make.names )

# define a function that imputes NA values in the same city, for the same food
imputeFoodPrice <- function(DF, i) {
  sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
  imputed <- mean(DF$food.price[sel], na.rm = TRUE)
  if (is.nan(imputed)) { # careful, there might be no other match
    imputed <- NA
  }
  return(imputed)
}


# apply the function to replace NA values
for (iMissing in which(is.na(myDF$food.price))) {
  myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
}


B.



> On May 9, 2017, at 3:14 PM, Olu Ola via R-help <r-help at r-project.org> wrote:
> 
> Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated.
> 
> | hhid | city | food | food price |
> | 101 | Paxton | rice | 10 |
> | 101 | Paxton | beans | 30 |
> | 101 | Paxton | flour | NA |
> | 101 | Paxton | eggs | 20 |
> | 102 | Paxton | rice | NA |
> | 102 | Paxton | beans | NA |
> | 102 | Paxton | flour | 34 |
> | 102 | Paxton | eggs | 21 |
> | 103 | Paxton | rice | 15 |
> | 103 | Paxton | beans | 28 |
> | 103 | Paxton | flour | 32 |
> | 103 | Paxton | eggs | NA |
> | 104 | Hull | rice | NA |
> | 104 | Hull | beans | 34 |
> | 104 | Hull | flour | NA |
> | 104 | Hull | eggs | 24 |
> | 105 | Hull | rice | 18 |
> | 105 | Hull | beans | 38 |
> | 105 | Hull | flour | 36 |
> | 105 | Hull | eggs | 26 |
> | 106 | Hull | rice | NA |
> | 106 | Hull | beans | NA |
> | 106 | Hull | flour | 40 |
> | 106 | Hull | eggs | NA |
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list