[R] Aggregate behaviour inconsistent (?) when FUN=table

William Dunlap wdunlap at tibco.com
Tue Feb 6 18:07:26 CET 2018


Don't use aggregate's simplify=TRUE when FUN() produces return
values of various dimensions.  In your case, the shape of table(subset)'s
return value depends on the number of levels in the factor 'subset'.
If you make B a factor before splitting it by C, each split will have the
same number of levels (2).  If you split it and then let table convert
each split to a factor, one split will have 1 level and the other 2.  To see
the details of the output , use str() instead of print().


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet <alain.guillet at uclouvain.be>
wrote:

> Dear R users,
>
> When I use aggregate with table as FUN, I get what I would call a strange
> behaviour if it involves numerical vectors and one "level" of it is not
> present for every "levels" of the "by" variable:
>
> ---------------------------
>
> > df <- data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0
> ,1,0,0,1,1,1))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1    B
> 1       0   1   2    3
> 2       1   3   2 2, 3
>
> > table(df$C,df$B)
>
>     0 1
>   0 3 0
>   1 2 3
>
> ---------------
>
> As you can see, a comma appears in the column with the variable B in the
> aggregate whereas when I call table I obtain the same result as if B was
> defined as a factor (I suppose it comes from the fact "non-factor arguments
> a are coerced via factor" according to the details of the table help). I
> find it completely normal if I remember that aggregate first splits the
> data into subsets and then compute the table. But then I don't understand
> why it works differently with character vectors. Indeed if I use character
> vectors, I get the same result as with factors:
>
> ------------------------
>
> > df <- data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=fa
> ctor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0",
> "1","0","0","1","1","1")))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
> 1       0   1   2   3   0
> 2       1   3   2   2   3
>
> > df <- data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0
> ,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
> 1       0   1   2   3   0
> 2       1   3   2   2   3
>
> ---------------------
>
> Is it possible to precise anything about this behaviour in the aggregate
> help since the result is not completely compatible with the expectation of
> result we can have according to the table help? Or would it be possible to
> have the same results independently of the vector type? This post was
> rejected on the R-devel mailing list so I ask my question here as suggested.
>
>
> Best regards,
> Alain Guillet
>
> --
> Alain Guillet
> Statistician and Computer Scientist
>
> SMCS - IMMAQ - Université catholique de Louvain
> http://www.uclouvain.be/smcs
>
> Bureau c.316
> Voie du Roman Pays, 20 (bte L1.04.01)
> B-1348 Louvain-la-Neuve
> Belgium
>
> Tel: +32 10 47 30 50
>
> Accès: http://www.uclouvain.be/323631.html
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list