[R] Collapsing data frame; aggregate() or better function?

jim holtman jholtman at gmail.com
Thu Sep 13 23:18:39 CEST 2007


The second argument for aggregate is supposed to be a list, so try
(notice the missing comma before "1:8"):

test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[1:8],sum)


On 9/13/07, Tobin, Jared <TobinJR at dfo-mpo.gc.ca> wrote:
> Hello r-help,
>
> I am trying to collapse or aggregate 'some' of a data frame.  A very
> simplified version of my data frame looks like:
>
> > tester
>  trip set num sex lfs1 lfs2
> 1  313  15   5   M    2    3
> 2  313  15   3   F    1    2
> 3  313  17   1   M    0    1
> 4  313  17   2   F    1    1
> 5  313  17   1   U    1    0
>
> And I want to omit sex from the picture and just get an addition of num,
> lfs1, and lfs2 for each unique trip/set combination.  Using aggregate()
> works fine here,
>
> > test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum)
> > test
>  trip set num lfs1 lfs2
> 1  313  15   8    3    5
> 2  313  17   4    2    2
>
> But I'm having trouble getting the same function to work on my actual
> data frame which is considerably larger.
>
> > dim(lf1.turbot)
> [1] 16468   217
> > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],
> sum)
> Error in vector("list", prod(extent)) : vector size specified is too
> large
> In addition: Warning messages:
> 1: NAs produced by integer overflow in: ngroup * (as.integer(index) -
> one)
> 2: NAs produced by integer overflow in: group + ngroup *
> (as.integer(index) - one)
> 3: NAs produced by integer overflow in: ngroup * nlevels(index)
>
> I'm guessing that either aggregate() can't handle a data frame of this
> size OR that there is an issue with 'omitting' more than one variable
> (in the same way I've omitted sex in the above example).  Can anyone
> clarify and/or recommend any relatively simple alternative procedure to
> accomplish this?
>
> I plan on trying variants of by() and tapply() tomorrow morning, but I'm
> about to head home for the day.
>
> Thanks,
>
> --
>
> jared tobin, student research assistant
> fisheries and oceans canada
> tobinjr at dfo-mpo.gc.ca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list