[R] summing NAs in aggregate

R. Michael Weylandt michael.weylandt at gmail.com
Thu Jan 12 05:55:59 CET 2012


Perhaps this would work:

spitzSum <- function(x) if(all(is.na(x))) NA else sum(x, na.rm = TRUE)

Michael

On Wed, Jan 11, 2012 at 11:18 PM, Spitzer, Matthew
<Matthew.Spitzer at bmc.org> wrote:
> Hello,
> I would like to please ask for assistance with aggregate sum.  I have a data set with consisting of two grouping variables (id, visit) and several other variables.  I would like to sum the variables for each id and visit, but am having problems with na.rm.  na.rm=TRUE seems to replace all NAs with zeros, or better stated results in a zero when summing a set of NAs.  I would like to remove NAs when some NAs are present in a group (1+NA + NA =1 or NA + 1 +1=2), but retain/keep the NA if the entire group consists of NAs (NA + NA + NA=NA).
>
> I have created an truncated example (my data set has many more rows):
>
> example <-
> structure(list(id = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
> 12L, 12L, 12L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L,
> 43L, 43L), visit = c(3L, 3L, 3L, 3L, 5L, 5L, 5L, 9L, 9L, 9L,
> 9L, 12L, 12L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 9L, 9L, 12L, 12L,
> 12L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 12L, 12L, 12L, 12L), var1 = c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L), var2 = c(1L, NA, NA, NA, NA, 1L, NA, 0L,
> 1L, 0L, NA, 0L, NA, NA, NA, 1L, 0L, NA, 0L, 1L, 1L, 1L, NA, 1L,
> 0L, NA, NA, 1L, 1L, 0L, NA, 1L, NA, NA, 0L, 1L, 1L, 1L), var3 = c(NA,
> NA, NA, NA, 1L, 0L, 1L, NA, 1L, 1L, NA, 0L, 1L, NA, 0L, 1L, NA,
> 1L, 0L, NA, 1L, 0L, 1L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 0L, 1L,
> NA, NA, NA, NA, 1L), var4 = c(0L, 1L, NA, NA, NA, 1L, 1L, 0L,
> NA, NA, 0L, 1L, 1L, NA, 1L, 1L, 0L, 1L, 1L, NA, 1L, NA, 0L, 0L,
> 0L, NA, NA, NA, NA, NA, NA, 1L, 1L, NA, 0L, 0L, 0L, NA)), .Names = c("id",
> "visit", "var1", "var2", "var3", "var4"), class = "data.frame", row.names = c(NA,
> -38L))
> example<-as.data.frame(example)
>
> #generates 0s for groups with all NAs such as id 43, visit 3, var4 that I would like to be NA
> agex1 <-aggregate.data.frame(example, by=list(example$id,example$visit),FUN=sum,na.rm=TRUE)
>
> #discards sums with any NAs in it, including many data that I would like to analyze, too many NAs
> agex2<-aggregate.data.frame(example, by=list(example$id,example$visit),FUN=sum)
>
> na.action does not seem to work with data frames in this instance.  I have tried to create a function to fix this, but have had great difficulty.  I have thought about ddply but cannot figure out how to apply this.  Would anyone be able to please suggest an alternate means of summing by group to retain NAs when I would like but not when they are part of an entire set of NAs?  I would be very grateful for a suggestion for an alternate way to process these data.
>
> Thanks, Matt
>
> This electronic transmission may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, please notify me immediately as use of this information is strictly prohibited.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list