[R] dplyr - counting a number of specific values in each column - for all columns at once

Clint Bowman clint at ecy.wa.gov
Tue Jun 16 20:06:17 CEST 2015


May want to add headers but the following provides the device number with 
each set fo sums:

for (dev in (unique(md$device))) 
{cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")}

Clint Bowman			INTERNET:	clint at ecy.wa.gov
Air Quality Modeler		INTERNET:	clint at math.utah.edu
Department of Ecology		VOICE:		(360) 407-6815
PO Box 47600			FAX:		(360) 407-7534
Olympia, WA 98504-7600

         USPS:           PO Box 47600, Olympia, WA 98504-7600
         Parcels:        300 Desmond Drive, Lacey, WA 98503-1274

On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:

> Except, of course, Bert, that you forgot that it had to be done by
> device. Your solution ignores the device.
>
> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5),
>      device = c(1,1,2,2,3,3))
> myvars = c("a", "b", "c")
> md[2,3] <- NA
> md[4,1] <- NA
> md
> vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L)
>
> But the result should be by device.
>
> On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Thank you, Bert.
>> I'll be honest - I am just learning dplyr and was wondering if one
>> could do it in dplyr.
>> But of course your solution is perfect...
>>
>> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>> Well, dplyr seems a bit of overkill as it's so simple with plain old
>>> vapply() in base R :
>>>
>>>
>>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE),
>>> +                    b=sample(3:7,10,rep=TRUE),
>>> +                    g = sample(7:9,10,rep=TRUE))
>>>
>>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L)
>>>
>>> a b g
>>> 5 4 0
>>>
>>>
>>>
>>> Cheers,
>>> Bert
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge is
>>> certainly not wisdom."
>>>    -- Clifford Stoll
>>>
>>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>>
>>>> Hello!
>>>>
>>>> I have a data frame:
>>>>
>>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c =
>>>> c(1,3,4,3,5,5),
>>>>       device = c(1,1,2,2,3,3))
>>>> myvars = c("a", "b", "c")
>>>> md[2,3] <- NA
>>>> md[4,1] <- NA
>>>> md
>>>>
>>>> I want to count number of 5s in each column - by device. I can do it like
>>>> this:
>>>>
>>>> library(dplyr)
>>>> group_by(md, device) %>%
>>>> summarise(counts.a = sum(a==5, na.rm = T),
>>>>           counts.b = sum(b==5, na.rm = T),
>>>>           counts.c = sum(c==5, na.rm = T))
>>>>
>>>> However, in real life I'll have tons of variables (the length of
>>>> 'myvars' can be very large) - so that I can't specify those counts.a,
>>>> counts.b, etc. manually - dozens of times.
>>>>
>>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at once?
>>>>
>>>>
>>>> --
>>>> Dimitri Liakhovitski
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>
>
>
> -- 
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list