[Rd] (PR#9666) 'aggregate' should preserve level ordering of

ripley at stats.ox.ac.uk ripley at stats.ox.ac.uk
Mon May 14 11:04:48 CEST 2007


On Tue, 8 May 2007, prechelt at inf.fu-berlin.de wrote:

> Full_Name: Lutz Prechelt
> Version: 2.4.1
> OS: Windows XP
> Submission from: (NULL) (160.45.111.67)
>
>
> aggregate (from package stats) should preserve the
> ordering of levels of factors it works on and also their
> 'ordered' attribute if present.
> But it does not.

In fact it treats all grouping variables consistently, reducing them to 
their levels and then data.frame does as.factor on the resulting column.

It is not at all clear this is desirable.  Take the example on the help 
page: 'Cold' is reported as a factor even though it is logical. It seems 
better not to coerce any of the grouping factors when putting into the 
data frame but rather to index the original variable, and I have 
implemented that for R-devel: as a side effect your example works as you 
would like.  This does mean that grouping variables that are not factors 
and cannot be inserted into a data frame will no longer work.

> Here is an example:
>
> ff = factor(c("a","b","a","b"),levels=c("b","a"),ordered=T)
> agg = aggregate(1:4, list(groups=ff), sum)
> print(levels(agg$groups))  # should be: "b" "a"
> [1] "a" "b"
> print(is.ordered(agg$groups))  # should be: TRUE
> [1] FALSE
>
> -----
>
> ?aggregate ignores the issue completely:
> - the terms 'order' or 'level' do not occur in the
>  text at all
> - the term 'factor' is mentioned only once:
>  "The elements of the list will be coerced to
>   factors (if they are not already factors)."
>
> -----
>
> This issue made me write the following code used
> for preparing the data for a barchart:
>
>  df.a = aggregate(df[,value.var],
>                   list(grouping=dfgrouping, other=dfsubbar.var),
>                   FUN=FUN)
>  if (is.factor(dfsubbar.var)) {  # R 2.4: this should be done by 'aggregate'
>    df.a$other = factor(df.a$other,
>                        levels=levels(dfsubbar.var),
>                        ordered=is.ordered(dfsubbar.var))
>  }
>
> Cumbersome.
>
> R is great anyway. Thanks for your service building it!
>
>  Lutz Prechelt
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list