[R] How to sum one column in a data frame keyed on other columns

Alex Brown alex at transitive.com
Wed Dec 13 12:56:12 CET 2006


Hi George

   I'm running ubuntu dapper badger with 2.4.0.

add  the line

   deb http://cran.R-project.org/bin/linux/ubuntu dapper/

   to your  /etc/apt/sources.list

-Alex Brown

On 13 Dec 2006, at 02:20, George Nachman wrote:

> Thanks, everyone, for the help!
>
> Bill:
> This looks like a really great general solution, but unfortunately I
> get an error when I call sumUp:
>
> Error in "[.data.frame"(dat, , key_list, drop = FALSE) :
>         invalid subscript type
>
> I'm running 2.2.1 because that's the latest I can get on
> ubuntu...could that be the problem?
>
>
> On 12/12/06, Bill.Venables at csiro.au <Bill.Venables at csiro.au> wrote:
>> Here is an elementary way of doing it:
>>
>>> dat
>>           url time somethingirrelevant visits
>> 1 www.foo.com 1:00                 xxx    100
>> 2 www.foo.com 1:00                 yyy     50
>> 3 www.foo.com 2:00                 xyz     25
>> 4 www.bar.com 1:00                 xxx    200
>> 5 www.bar.com 1:00                 zzz    200
>> 6 www.foo.com 2:00                 xxx    500
>>> dat <- transform(dat, key = paste(url, time))
>>> total_visits <- with(dat, tapply(visits, key, sum))
>>> m <- match(names(total_visits), dat$key)
>>> tdat <- cbind(dat[m, c("url", "time")], total_visits)
>>> tdat
>>           url time total_visits
>> 4 www.bar.com 1:00          400
>> 1 www.foo.com 1:00          150
>> 3 www.foo.com 2:00          525
>>>
>>
>> This should not be too difficult to morph into a fairly general
>> function.  Here's what I might do [warning: somewhat obscure code
>> follows]
>>
>> sumUp <- function(dat, key_list, sum_list) {
>>   key <- with(dat, do.call("paste", dat[, key_list, drop = FALSE]))
>>   totals <- as.matrix(sapply(dat[, sum_list, drop = FALSE],  
>> tapply, key,
>> sum))
>>   dimnames(totals)[[2]] <- paste("total", sum_list, sep = "_")
>>   m <- match(dimnames(totals)[[1]], key)
>>   cbind(dat[m, key_list, drop = FALSE], totals)
>> }
>>
>> check:
>>
>>> sumUp(dat, c("url", "time"), "visits")
>>           url time total_visits
>> 4 www.bar.com 1:00          400
>> 1 www.foo.com 1:00          150
>> 3 www.foo.com 2:00          525
>>
>>> sumUp(dat, "url", "visits")
>>           url total_visits
>> 4 www.bar.com          400
>> 1 www.foo.com          675
>>
>> Question for the reader: why to you need 'drop = FALSE' (in three
>> places)?
>>
>> Bill Venables.
>>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of George Nachman
>> Sent: Wednesday, 13 December 2006 9:35 AM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] How to sum one column in a data frame keyed on other
>> columns
>>
>> I have a data frame that looks like this:
>>
>> url         time somethingirrelevant visits
>> www.foo.com 1:00 xxx                 100
>> www.foo.com 1:00 yyy                 50
>> www.foo.com 2:00 xyz                 25
>> www.bar.com 1:00 xxx                 200
>> www.bar.com 1:00 zzz                 200
>> www.foo.com 2:00 xxx                 500
>>
>> I'd like to write some code that takes this as input and outputs
>> something like this:
>>
>> url         time total_vists
>> www.foo.com 1:00 150
>> www.foo.com 2:00 525
>> www.bar.com 1:00 400
>>
>> In other words, I need to calculate the sum of visits for each unique
>> tuple of (url,time).
>>
>> I can do it with this code, but it's very slow, and doesn't seem like
>> the right approach:
>>
>> keys = list()
>> getkey = function(m,cols,index) { paste(m 
>> [index,cols],collapse=",")  }
>> for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 }
>> for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] =
>> keys[[getkey(data,1:2,i)]] + data[i,4] }
>>
>> I'm sure there's a more functional-programming approach to this
>> problem! Any ideas?
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list