[R] Efficient way to determine if a data frame has missing observations

Erik Iverson eriki at ccbr.umn.edu
Wed Feb 2 19:07:19 CET 2011



H Roark wrote:
> I have a data set covering a large number of cities with values for characteristics such as land area, population, and employment. The problem I have is that some cities lack observations for some of the characteristics and I'd like a quick way to determine which cities have missing data.  For example:
> 
> city<-c("A","A","A","B","B","C") 
> var<-c("sqmi","pop","emp","pop","emp","pop")
> value<-c(10,100,40,30,10,20)
> df<-data.frame(city,var,value)
> 
> In this data frame, city A has complete data for the three variables, while city B is missing land area, and city C only has population data. In the full data frame, my approach to finding the missing observations has been to create a data frame with all combinations of 'city' and 'var', merge this onto the original data frame, and then extract the observations with missing data for 'value':
> 
> city_unq<-c("A","B","C")
> var_unq<-c("sqmi","pop","emp")
> comb<-expand.grid(city=city_unq,var=var_unq)
> 
> mrg<-merge(comb,df,by=c("city","var"),all=T)
> missing<-mrg[is.na(mrg$value),]

Perhaps the following, or a variation thereof?

subset(as.data.frame(table(city = df$city, var = df$var)), Freq == 0)



More information about the R-help mailing list