[Rd] Improve aggregate.default ...?

Gabor Grothendieck ggrothendieck at gmail.com
Sat May 9 14:23:49 CEST 2009


Try this:

> aggregate(dat["A"], dat["Group"], mean)
  Group         A
1     1 0.4944810
2     2 0.4765412
3     3 0.4521068
4     4 0.4989000

On Sat, May 9, 2009 at 8:14 AM, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
> Hi,
>
> I find it a bit annoying that aggregate.default forces the returned
> object to loose the 'name' of the variable aggregated, replacing it with
> 'x'.
>
> A brief example:
>
>> dat <- data.frame(A = runif(100), B = rnorm(100),
> +                   Group = gl(4, 25))
>> with(dat, aggregate(A, by = list(Group = Group), FUN = mean))
>  Group         x
> 1     1 0.6523228
> 2     2 0.4544317
> 3     3 0.4619624
> 4     4 0.4703156
>
> This arises because aggregate default has:
>
> function (x, ...)
> {
>    if (is.ts(x))
>        aggregate.ts(as.ts(x), ...)
>    else aggregate.data.frame(as.data.frame(x), ...)
> }
>
> which recasts x as a data frame, but doesn't make any effort to supply a
> name. Can we do a better job of supplying a useful name?
>
> My first attempt is:
>
> aggregate.default <- function(x, ...) {
>    if (is.ts(x))
>        aggregate.ts(as.ts(x), ...)
>    else {
>        nam <- deparse(substitute(x))
>        x <- as.data.frame(x)
>        names(x) <- nam
>        aggregate.data.frame(x, ...)
>    }
> }
>
> Which works for the brief example above:
>
>> with(dat, aggregate(A, by = list(Group = Group), FUN = mean))
>  Group         A
> 1     1 0.4269715
> 2     2 0.5479352
> 3     3 0.5091543
> 4     4 0.4926412
>
> However, it fails make check-all because examples have relied on
> returned object having 'x'. I also note that this might have the
> annoying side effect of producing odd names if we use the following
> incantation:
>
>> res <- aggregate(dat$A, by = list(Group = dat$Group), FUN = mean)
>> str(res)
> 'data.frame':   4 obs. of  2 variables:
>  $ Group: Factor w/ 4 levels "1","2","3","4": 1 2 3 4
>  $ dat$A: num  0.427 0.548 0.509 0.493
>> res$dat$A
> Error in res$dat$A : $ operator is invalid for atomic vectors
>> res$`dat$A`
> [1] 0.4269715 0.5479352 0.5091543 0.4926412
>
> Is there a way of coming up with a better way to name the aggregated
> variable? Would a change of this kind be something R Core would consider
> making to aggregate.default if a good solution is found?
>
> Thanks in advance,
>
> G
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list