[R] aggregate function - na.action

Phil Spector spector at stat.berkeley.edu
Fri Feb 4 22:52:53 CET 2011


Gene -
    Let me try to address your concerns one at a time:

Since the formula interface to aggregate was introduced 
pretty recently (I think R-2.11.1, but I might be wrong)
so when you try to use it in an R-2.10.1 it won't work.

Now let's take a close look at the help page for aggregate.

The default method, which will be called if you pass a vector
to aggregate, or the data frame method are described like this:

      aggregate(x, ...)

      ## S3 method for class 'data.frame'
      aggregate(x, by, FUN, ..., simplify = TRUE)

So if you pass an na.action= argument to aggregate when the first argument
is a vector or data frame, it gets picked up by the ... argument and gets
passed to your function, so you might see messages like this:

> sum(1:10,na.action=na.omit)
Error in sum(1:10, na.action = na.omit) :
   invalid 'type' (closure) of argument
> sum(1:10,na.action='na.omit')
Error in sum(1:10, na.action = "na.omit") :
   invalid 'type' (character) of argument

(It's sum complaining, not aggregate.)

As far as na.action goes, when you're using the aggregate formula method,
it will remove all rows from the specified data frame that have any missing
values.  If you pass that to a function with the na.rm=TRUE argument, that
function will remove the missing values as it should.  So the only time you'll
see the effect of na.action=na.pass is when you call a function that won't
remove the missing values.   (The subtle distinction between na.action=na.omit
and na.rm=TRUE is the function you're calling is that na.omit will remove
the entire row of data when it encounters a missing value, while the na.rm=TRUE
argument will remove missing values separately from each variable.)

Hope this helps.
 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu



On Fri, 4 Feb 2011, Gene Leynes wrote:

> Can someone please tell me what is up with na.action in aggregate?
>
> My (somewhat) reproducible example:
> (I say somewhat because some lines wouldn't run in a separate session, more
> below)
>
> set.seed(100)
> dat=data.frame(
>        x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>        x2=sample(c(NA, 1:10), 100, replace=TRUE),
>        x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>        x4=sample(c(NA,T,F), 100, replace=TRUE),
>        y=sample(c(rep(NA,5), rnorm(95))))
> dat
> ## The total from dat:
> sum(dat$y, na.rm=T)
> ## The total from aggregate:
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This line
> gave an error in a separate R instance
> ## The aggregate formula is excluding NA
>
> ## So, let's try to include NAs
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
> ## The aggregate formula is STILL excluding NA
> ## In fact, the formula doesn't seem to notice the na.action
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
> chew')$y)
> ## Hmmmm... that error surprised me (since the previous two things ran)
>
> ## So, let's try to change the global options
> ## (not mentioned in the help, but after reading the help
> ##  100 times, I thought I would go above and beyond to avoid
> ##  any r list flames from people complaining
> ##  that I didn't read the help... but that's a separate topic)
> options(na.action ="na.pass")
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
> ## (NAs are still omitted)
>
> ## Even more frustrating...
> ## Why don't any of these work???
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)
>
>
> ## This does work, but in my real data set, I want NA to really be NA
> for(j in 1:4)
>    dat[is.na(dat[,j]),j] = 'NA'
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>
>
> ## My first session info
> #
> #> sessionInfo()
> #R version 2.12.0 (2010-10-15)
> #Platform: i386-pc-mingw32/i386 (32-bit)
> #
> #locale:
> #        [1] LC_COLLATE=English_United States.1252
> #[2] LC_CTYPE=English_United States.1252
> #[3] LC_MONETARY=English_United States.1252
> #[4] LC_NUMERIC=C
> #[5] LC_TIME=English_United States.1252
> #
> #attached base packages:
> #        [1] stats     graphics  grDevices utils     datasets  methods
> base
> #
> #other attached packages:
> #        [1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
> #
> #loaded via a namespace (and not attached):
> #        [1] grid_2.12.0     gtools_2.6.2    lattice_0.19-13 rJava_0.8-8
> #[5] tools_2.12.0
>
>
>
> I tried running that example in a different version of R, with and I got
> completely different results
>
> The other version of R wouldn't recognize the formula at all..
>
> My other version of R:
>
> #  My second session info
> #> sessionInfo()
> #R version 2.10.1 (2009-12-14)
> #i386-pc-mingw32
> #
> #locale:
> #        [1] LC_COLLATE=English_United States.1252
> #[2] LC_CTYPE=English_United States.1252
> #[3] LC_MONETARY=English_United States.1252
> #[4] LC_NUMERIC=C
> #[5] LC_TIME=English_United States.1252
> #
> #attached base packages:
> #        [1] stats     graphics  grDevices utils     datasets  methods
> base
> #>
> #
>
> PS: Also, I have read the help on aggregate, factor, as.factor, and several
> other topics.  If I missed something, please let me know.
> Some people like to reply to questions by telling the sender that R has
> documentation.  Please don't.  The R help archives are littered with
> reminders, friendly and otherwise, of R's documentation.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list