[R] Applying by() when groups have different lengths

William Dunlap wdun|@p @end|ng |rom t|bco@com
Mon Sep 17 21:25:50 CEST 2018


>
> by(rainfall_by_site, rainfall_by_site[, 'name'], function(x) {
>
+ mean.rain <- mean(rainfall_by_site[, 'prcp'])
+ })

Note that you define a function of x which does not use x in it.
Hence, even if the function gave a value, it would give the same
value for each group.  To see what the 'x' in that function will
be, use the identity function:

> d <- data.frame(X=2^(0:5), Y=2^(6:11), Group=c("A","B","C","A","B","A"))
> by(d[,1:2], d$Group, function(x)x)
d$Group: A
   X    Y
1  1   64
4  8  512
6 32 2048
------------------------------------------------------------
d$Group: B
   X    Y
2  2  128
5 16 1024
------------------------------------------------------------
d$Group: C
  X   Y
3 4 256

I suspect you want to use the aggregate function.

> aggregate(d[,1:2], list(Group=d$Group), sum)
  Group  X    Y
1     A 41 2624
2     B 18 1152
3     C  4  256

or the functions in the dplyr package:

> d %>% group_by(Group) %>% summarize(sumX=sum(X), meanY=mean(Y))
# A tibble: 3 x 3
  Group  sumX meanY
  <fct> <dbl> <dbl>
1 A        41  875.
2 B        18  576
3 C         4  256






Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Sep 17, 2018 at 11:54 AM, Rich Shepard <rshepard using appl-ecosys.com>
wrote:

>   My dataframe has 113K rows split by a factor into 58 separate
> data.frames,
> with a different numbers of rows (see error output below).
>
>   I cannot think of a way of proving a sample of data; if a sample for a
> MWE
> is desired advice on produing one using dput() is needed.
>
>   To summarize each group within this dataframe I'm using by() and getting
> an error because of the different number of rows:
>
> by(rainfall_by_site, rainfall_by_site[, 'name'], function(x) {
>>
> + mean.rain <- mean(rainfall_by_site[, 'prcp'])
> + })
> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names
> = TRUE,  :
>   arguments imply differing number of rows: 4900, 1085, 1894, 2844, 3520,
>  647, 239, 3652, 3701, 3063, 176, 4713, 4887, 119, 165, 1221, 3358, 1457,
>  4896, 166, 690, 1110, 212, 1727, 227, 236, 1175, 1485, 186, 769, 139, 203,
>  2727, 4357, 1035, 1329, 1454, 973, 4536, 208, 350, 125, 3437, 731, 4894,
>  2598, 2419, 752, 427, 136, 685, 4849, 914, 171
>
>   My web searches have not found anything relevant; perhaps my search terms
> (such as 'R: apply by() with different factor row numbers') can be
> improved.
>
>   The help pages found using apropos('by') appear the same: ?by,
> ?by.data.frame, ?by.default and provide no hint on how to work with unequal
> rows per factor.
>
>   How can I apply by() on these data.frames?
>
> Rich
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]




More information about the R-help mailing list