[R] dplyr - counting a number of specific values in each column - for all columns at once

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Tue Jun 16 19:58:40 CEST 2015


Except, of course, Bert, that you forgot that it had to be done by
device. Your solution ignores the device.

md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5),
      device = c(1,1,2,2,3,3))
myvars = c("a", "b", "c")
md[2,3] <- NA
md[4,1] <- NA
md
vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L)

But the result should be by device.

On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Thank you, Bert.
> I'll be honest - I am just learning dplyr and was wondering if one
> could do it in dplyr.
> But of course your solution is perfect...
>
> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>> Well, dplyr seems a bit of overkill as it's so simple with plain old
>> vapply() in base R :
>>
>>
>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE),
>> +                    b=sample(3:7,10,rep=TRUE),
>> +                    g = sample(7:9,10,rep=TRUE))
>>
>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L)
>>
>> a b g
>> 5 4 0
>>
>>
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge is
>> certainly not wisdom."
>>    -- Clifford Stoll
>>
>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>>
>>> Hello!
>>>
>>> I have a data frame:
>>>
>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c =
>>> c(1,3,4,3,5,5),
>>>       device = c(1,1,2,2,3,3))
>>> myvars = c("a", "b", "c")
>>> md[2,3] <- NA
>>> md[4,1] <- NA
>>> md
>>>
>>> I want to count number of 5s in each column - by device. I can do it like
>>> this:
>>>
>>> library(dplyr)
>>> group_by(md, device) %>%
>>> summarise(counts.a = sum(a==5, na.rm = T),
>>>           counts.b = sum(b==5, na.rm = T),
>>>           counts.c = sum(c==5, na.rm = T))
>>>
>>> However, in real life I'll have tons of variables (the length of
>>> 'myvars' can be very large) - so that I can't specify those counts.a,
>>> counts.b, etc. manually - dozens of times.
>>>
>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at once?
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Dimitri Liakhovitski



-- 
Dimitri Liakhovitski



More information about the R-help mailing list