[R] aggregate(), tapply(): Why is the order of the grouping variables not kept?

Marius Hofert marius.hofert at math.ethz.ch
Tue Mar 12 14:25:22 CET 2013


>
> I'm no expeRt, but suppose that we change the setup slightly:
>
>   xx <- x[sample(nrow(x)), ]
>
> Now what would you like
>
>  aggregate(value ~ group + year, data=xx, FUN=function(z) z[1])
>
> to return?
>
> Personally, I prefer to have R return the same thing regardless
> of how the input dataframe is sorted,

Personally, I prefer to have R not to change my input as much as possible... but
I totally agree with you that there are other instances where it's preferable
that the output does not depend on the input.

> i.e. the result should depend only on the formula. You just have to know that
> the order is to have the first factor vary most rapidly,

... which I still find very confusing/unnatural, but okay.

> then the next, etc.  I think that's documented somewhere, but I don't know
> where.

it's also the default behavior of expand.grid() for example.

Cheers,

Marius

>
>
> Peter Ehlers
>



More information about the R-help mailing list