[R] aggregate() runs out of memory

Mon Nov 26 22:08:59 CET 2012

Hi Sam,

On Mon, Nov 26, 2012 at 3:13 PM, Sam Steingold <sds at gnu.org> wrote:
> Hi,
>
>> * Steve Lianoglou <znvyvatyvfg.ubarlcbg at tznvy.pbz> [2012-11-19 13:30:03 -0800]:
>>
>> For instance, if you want the min and max of `delay` within each group
>> defined by `share.id`, and let's assume `infl` is a data.frame, you
>> can do something like so:
>>
>> R> as.data.table(infl)
>> R> setkey(infl, share.id)
>> R> result <- infl[, list(min=min(delay), max=max(delay)), by="share.id"]
>
> perfect, thanks.
> alas, the resulting table does not contain the share.id column.
> do I need to add something like "id=unique(share.id)" to the list?
> also, if there is a field in the original table infl which only depends
> on share.id, how do I add this unique value to the summary?
> it appears that "count=unique(country)" in list() does what I need, but
> it slows down the process.

Hmm ... I think it should be there, but I'm having  a hard time
remember what you want.

Could you please copy paste the output of `dput(head(infl, 20))` as
well as an approximation of what the result is that you want.

It will make it easier for us to talk more concretely about how to get
what you want.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact