[R] Counting number of rows with two criteria in dataframe

Matthew Dowle mdowle at mdowle.plus.com
Wed Jan 26 12:13:02 CET 2011


Note that a key is not actually required, so it's even simpler syntax :

dX = as.data.table(X)
dX[,length(unique(z)),by="x,y"]
     x y V1
[1,] 1 1  2
[2,] 1 2  2
[3,] 2 3  2
[4,] 2 4  2
[5,] 3 5  2
[6,] 3 6  2

or passing list() syntax to the 'by' is exactly the same :

dX[,length(unique(z)),by=list(x,y)]

The advantage of using the list() form is you can group by expressions
of columns, for example if x was a date column :

dX[,length(unique(z)),by=list(month(x),y)]

Matthew


"Dennis Murphy" <djmuser at gmail.com> wrote in message 
news:AANLkTi=8TYSrRfzfm01m7fpzydh-cLS-J-cMbkAkjXxf at mail.gmail.com...
> Hi:
>
> Here are two more candidates, using the plyr and data.table packages:
>
> library(plyr)
> ddply(X, .(x, y), function(d) length(unique(d$z)))
>  x y V1
> 1 1 1  2
> 2 1 2  2
> 3 2 3  2
> 4 2 4  2
> 5 3 5  2
> 6 3 6  2
>
> The function counts the number of unique z values in each sub-data frame
> with the same x and y values. The argument d in the anonymous function is 
> a
> data frame object.
>
> # data.table version:
>
> library(data.table)
> dX <- data.table(X, key = 'x, y')
> dX[, list(nz = length(unique(z))), by = 'x, y']
>     x y nz
> [1,] 1 1  2
> [2,] 1 2  2
> [3,] 2 3  2
> [4,] 2 4  2
> [5,] 3 5  2
> [6,] 3 6  2
>
> The key columns sort the data by x, y combinations and then find nz in 
> each
> data subset.
>
> If you intend to do a lot of summarization/data manipulation in R, these
> packages are worth learning.
>
> HTH,
> Dennis
>
> On Tue, Jan 25, 2011 at 11:25 AM, Ryan Utz <utz.ryan at gmail.com> wrote:
>
>> Hi R-users,
>>
>> I'm trying to find an elegant way to count the number of rows in a
>> dataframe
>> with a unique combination of 2 values in the dataframe. My data is
>> specifically one column with a year, one with a month, and one with a 
>> day.
>> I'm trying to count the number of days in each year/month combination. 
>> But
>> for simplicity's sake, the following dataset will do:
>>
>> x<-c(1,1,1,1,2,2,2,2,3,3,3,3)
>> y<-c(1,1,2,2,3,3,4,4,5,5,6,6)
>> z<-c(1,2,3,4,5,6,7,8,9,10,11,12)
>> X<-data.frame(x y z)
>>
>> So with dataset X, how would I count the number of z values (3rd column 
>> in
>> X) with unique combinations of the first two columns (x and y)? (for
>> instance, in the above example, there are 2 instances per unique
>> combination
>> of the first two columns). I can do this in Matlab and it's easy, but 
>> since
>> I'm new to R this is royally stumping me.
>>
>> Thanks,
>> Ryan
>>
>> --
>> Ryan Utz
>> Postdoctoral research scholar
>> University of California, Santa Barbara
>> (724) 272 7769
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>



More information about the R-help mailing list