[R] Applying by() when groups have different lengths

MacQueen, Don m@cqueen1 @end|ng |rom ||n|@gov
Mon Sep 17 21:35:36 CEST 2018


I'm also going to guess that maybe your object
   rainfall_by_site
has already been split into separate data frames (because of its name).

But by() does the splitting internally, so you should be passing it the original unsplit data frame.

You could supply example data by providing the first few rows of each of the first few groups. That would be enough to test with.

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 9/17/18, 11:54 AM, "R-help on behalf of Rich Shepard" <r-help-bounces using r-project.org on behalf of rshepard using appl-ecosys.com> wrote:

       My dataframe has 113K rows split by a factor into 58 separate data.frames,
    with a different numbers of rows (see error output below).
    
       I cannot think of a way of proving a sample of data; if a sample for a MWE
    is desired advice on produing one using dput() is needed.
    
       To summarize each group within this dataframe I'm using by() and getting
    an error because of the different number of rows:
    
    > by(rainfall_by_site, rainfall_by_site[, 'name'], function(x) {
    + mean.rain <- mean(rainfall_by_site[, 'prcp'])
    + })
    Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
       arguments imply differing number of rows: 4900, 1085, 1894, 2844, 3520,
      647, 239, 3652, 3701, 3063, 176, 4713, 4887, 119, 165, 1221, 3358, 1457,
      4896, 166, 690, 1110, 212, 1727, 227, 236, 1175, 1485, 186, 769, 139, 203,
      2727, 4357, 1035, 1329, 1454, 973, 4536, 208, 350, 125, 3437, 731, 4894,
      2598, 2419, 752, 427, 136, 685, 4849, 914, 171
    
       My web searches have not found anything relevant; perhaps my search terms
    (such as 'R: apply by() with different factor row numbers') can be improved.
    
       The help pages found using apropos('by') appear the same: ?by,
    ?by.data.frame, ?by.default and provide no hint on how to work with unequal
    rows per factor.
    
       How can I apply by() on these data.frames?
    
    Rich
    
    ______________________________________________
    R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
    



More information about the R-help mailing list