[R] Applying by() when groups have different lengths

MacQueen, Don m@cqueen1 @end|ng |rom ||n|@gov
Mon Sep 17 21:26:38 CEST 2018

Try changing it to 

     by(rainfall_by_site, rainfall_by_site[, 'name'],
    function(x) {mean.rain <- mean(x[, 'prcp'])

Inside the function, so to speak, the function sees an object named "x", because that's how the function is defined:  function(x).
So you have to operate on x inside the function.  

For sure, the fact that the subgroups have different numbers of rows is not the problem.


Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
Lab cell 925-724-7509

On 9/17/18, 11:54 AM, "R-help on behalf of Rich Shepard" <r-help-bounces using r-project.org on behalf of rshepard using appl-ecosys.com> wrote:

       My dataframe has 113K rows split by a factor into 58 separate data.frames,
    with a different numbers of rows (see error output below).
       I cannot think of a way of proving a sample of data; if a sample for a MWE
    is desired advice on produing one using dput() is needed.
       To summarize each group within this dataframe I'm using by() and getting
    an error because of the different number of rows:
    > by(rainfall_by_site, rainfall_by_site[, 'name'], function(x) {
    + mean.rain <- mean(rainfall_by_site[, 'prcp'])
    + })
    Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
       arguments imply differing number of rows: 4900, 1085, 1894, 2844, 3520,
      647, 239, 3652, 3701, 3063, 176, 4713, 4887, 119, 165, 1221, 3358, 1457,
      4896, 166, 690, 1110, 212, 1727, 227, 236, 1175, 1485, 186, 769, 139, 203,
      2727, 4357, 1035, 1329, 1454, 973, 4536, 208, 350, 125, 3437, 731, 4894,
      2598, 2419, 752, 427, 136, 685, 4849, 914, 171
       My web searches have not found anything relevant; perhaps my search terms
    (such as 'R: apply by() with different factor row numbers') can be improved.
       The help pages found using apropos('by') appear the same: ?by,
    ?by.data.frame, ?by.default and provide no hint on how to work with unequal
    rows per factor.
       How can I apply by() on these data.frames?
    R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list