[Rd] aggregate: with 2 by variables in the result the 2nd by-variable is wrong (PR#14213)

Peter Ehlers ehlers at ucalgary.ca
Fri Feb 12 22:01:52 CET 2010


franz.quehenberger at medunigraz.at wrote:
> Full_Name: Franz Quehenberger
> Version: 2.10.1
> OS: Windows XP
> Submission from: (NULL) (145.244.10.3)
> 
> 
> aggregate is supposed to produce a data.frame that contains a line for each
> combination  of levels of the variables in the by list. The first columns of the
> result contain these combinations of levels. With two by variables the second
> by-variable takes always only one value. However, it works fine with one or
> three by-variables.
> 
> The problems seems to be caused by this line of code in aggregate():
> 
>     w <- as.data.frame(w, stringsAsFactors = FALSE)[which(!unlist(lapply(z,
> is.null))), , drop = FALSE]
> 
> or more specifically by: 
> 
>     [which(!unlist(lapply(z, is.null))), , drop = FALSE]
> 
> Kind regards
> FQ
> 
> 
> 
> # demonstration of the aggregate bug ind R 2.10.1
> factor.a=rep(letters[1:3],4)
> factor.b=rep(letters[4:5],each=3,times=2)
> factor.c=rep(letters[4:5+2],each=6)
> data=data.frame(factor.a,factor.b,factor.c,x)
> x=1:12
> #one by-variable works:
> aggregate(x,list(a=factor.a),FUN=mean)
> #thre by-variable work fine:
> aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean)
> #two by-variables do not produce the levels of the second by-variable
> correctly:
> aggregate(x,list(a=factor.a,b=factor.b),FUN=mean)
> # data
> print(data)
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Result of the R code:
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
>> # demonstration of the aggregate bug ind R 2.10.1
>> factor.a=rep(letters[1:3],4)
>> factor.b=rep(letters[4:5],each=3,times=2)
>> factor.c=rep(letters[4:5+2],each=6)
>> data=data.frame(factor.a,factor.b,factor.c,x)
>> x=1:12
>> #one by-variable works:
>> aggregate(x,list(a=factor.a),FUN=mean)
>   a   x
> 1 a 5.5
> 2 b 6.5
> 3 c 7.5
>> #thre by-variable work fine:
>> aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean)
>   a b c x
> 1 a d d 4
> 2 b d d 5
> 3 c d d 6
> 4 a e e 7
> 5 b e e 8
> 6 c e e 9
>> #two by-variables do not produce the levels of the second by-variable
> correctly:
>> aggregate(x,list(a=factor.a,b=factor.b),FUN=mean)
>   a b x
> 1 a d 4
> 2 b d 5
> 3 c d 6
> 4 a d 7
> 5 b d 8
> 6 c d 9
> Warnmeldung:
> In data.frame(w, lapply(y, unlist, use.names = FALSE), stringsAsFactors = FALSE)
> :
>   row names were found from a short variable and have been discarded
>> # data
>> print(data)
>    factor.a factor.b factor.c  x
> 1         a        d        f  1
> 2         b        d        f  2
> 3         c        d        f  3
> 4         a        e        f  4
> 5         b        e        f  5
> 6         c        e        f  6
> 7         a        d        g  7
> 8         b        d        g  8
> 9         c        d        g  9
> 10        a        e        g 10
> 11        b        e        g 11
> 12        c        e        g 12
> 

I don't see this is 2.10.1 nor in 2.11.0 (Windows Vista).
I can't think of how you might have got your result.
Is there something you haven't mentioned?
What's your sessionInfo()?

-- 
Peter Ehlers
University of Calgary



More information about the R-devel mailing list