[R] Adding same items together in data.frame

Steve Lianoglou mailinglist.honeypot at gmail.com
Sat May 14 05:44:39 CEST 2011


Hi,

On Fri, May 13, 2011 at 7:06 PM, wong, honkit (Stephen)
<honkit at stanford.edu> wrote:
> Dear All,
> I am new to R. I have a 2 column data frame with more than ten thousand
> rows. Something like below. I want to add up all duplicated items, e.g. the
> three "aa" add up together to get a single value gene=a, value=74. How can I
> do that?? Thanks for help !
> gene value
> aa       20
> bb      10
> cc       9
> aa      30
> aa      24
> dd       100
> ee      55

In addition to Dennis' suggestion to use the aggregate function, you
could look at the plyr or data.table packages.

For instance. As Dennis suggested, lets assume your data is in a
data.frame object named `d`.

R> d <- data.frame(gene=c('aa', 'bb', 'cc', 'aa', 'aa', 'dd', 'ee'),
value=c(20, 10, 9, 30, 24, 100, 55))

Using data.table:

R> library(data.table)
R> dd <- data.table(d, key='gene') # note this will reorder the data in dd
R> dd[, list(total=sum(value)), by=gene]
     gene total
[1,]   aa    74
[2,]   bb    10
[3,]   cc     9
[4,]   dd   100
[5,]   ee    55

Or using plyr

R> library(plyr)
R> ddply(idata.frame(d), .(gene), summarize, total=sum(value))
  gene total
1   aa    74
2   bb    10
3   cc     9
4   dd   100
5   ee    55

Note that you don't have to use idata.frame(d) -- you can just do:

R> ddply(d, .(gene), summarize, total=sum(value))

but using idata.frame(d) helps to calculate the result faster,
especially noticeable for larger data.frame(s).

Using data.table will likely be faster still (again, more noticeable
with larger data.frames), but (for one thing) be aware that the order
of the rows in dd will be different than the ones in d: they will be
ordered by the key column(s). Also working with data.table objects is
somehow similar to "normal" data.frame objects, but they do differ in
important ways (eg. how to index columns using the [] syntax, for
starters).

You should go through the plyr tutorial(s) (at:
http://had.co.nz/plyr/) , or the vignette(s) that comes w/ data.table
for more info/help/use-cases if you plan to go that route.

Hope that helps,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list