[R] summing NAs in aggregate

Spitzer, Matthew Matthew.Spitzer at bmc.org
Thu Jan 12 05:18:20 CET 2012


Hello,
I would like to please ask for assistance with aggregate sum.  I have a data set with consisting of two grouping variables (id, visit) and several other variables.  I would like to sum the variables for each id and visit, but am having problems with na.rm.  na.rm=TRUE seems to replace all NAs with zeros, or better stated results in a zero when summing a set of NAs.  I would like to remove NAs when some NAs are present in a group (1+NA + NA =1 or NA + 1 +1=2), but retain/keep the NA if the entire group consists of NAs (NA + NA + NA=NA).  

I have created an truncated example (my data set has many more rows):

example <-
structure(list(id = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 
43L, 43L), visit = c(3L, 3L, 3L, 3L, 5L, 5L, 5L, 9L, 9L, 9L, 
9L, 12L, 12L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 9L, 9L, 12L, 12L, 
12L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 12L, 12L, 12L, 12L), var1 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), var2 = c(1L, NA, NA, NA, NA, 1L, NA, 0L, 
1L, 0L, NA, 0L, NA, NA, NA, 1L, 0L, NA, 0L, 1L, 1L, 1L, NA, 1L, 
0L, NA, NA, 1L, 1L, 0L, NA, 1L, NA, NA, 0L, 1L, 1L, 1L), var3 = c(NA, 
NA, NA, NA, 1L, 0L, 1L, NA, 1L, 1L, NA, 0L, 1L, NA, 0L, 1L, NA, 
1L, 0L, NA, 1L, 0L, 1L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 0L, 1L, 
NA, NA, NA, NA, 1L), var4 = c(0L, 1L, NA, NA, NA, 1L, 1L, 0L, 
NA, NA, 0L, 1L, 1L, NA, 1L, 1L, 0L, 1L, 1L, NA, 1L, NA, 0L, 0L, 
0L, NA, NA, NA, NA, NA, NA, 1L, 1L, NA, 0L, 0L, 0L, NA)), .Names = c("id", 
"visit", "var1", "var2", "var3", "var4"), class = "data.frame", row.names = c(NA, 
-38L))
example<-as.data.frame(example)

#generates 0s for groups with all NAs such as id 43, visit 3, var4 that I would like to be NA
agex1 <-aggregate.data.frame(example, by=list(example$id,example$visit),FUN=sum,na.rm=TRUE)

#discards sums with any NAs in it, including many data that I would like to analyze, too many NAs
agex2<-aggregate.data.frame(example, by=list(example$id,example$visit),FUN=sum)

na.action does not seem to work with data frames in this instance.  I have tried to create a function to fix this, but have had great difficulty.  I have thought about ddply but cannot figure out how to apply this.  Would anyone be able to please suggest an alternate means of summing by group to retain NAs when I would like but not when they are part of an entire set of NAs?  I would be very grateful for a suggestion for an alternate way to process these data.

Thanks, Matt

This electronic transmission may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, please notify me immediately as use of this information is strictly prohibited.



More information about the R-help mailing list