[R] Counting non-empty levels of a factor

Sun Nov 8 16:41:13 CET 2009

Thanks a lot for those solutions,
Both are working great, and they do slightly different (but both very
interesting) things,
Moreover, I learned about the length() function ... one more to add to
my personal cheat sheet
King Regards

2009/11/8 David Winsemius <dwinsemius at comcast.net>:
>
> On Nov 8, 2009, at 9:11 AM, David Winsemius wrote:
>
>>
>> On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:
>>
>>> Hi everyone,h
>>>
>>> I'm struggling with a little problem for a while, and I'm wondering if
>>> anyone could help...
>>>
>>> I have a dataset (from retailing industry) that indicates which brands
>>> are present in a panel of 500 stores,
>>>
>>> store , brand
>>> 1 , B1
>>> 1 , B2
>>> 1 , B3
>>> 2 , B1
>>> 2 , B3
>>> 3 , B2
>>> 3 , B3
>>> 3 , B4
>>>
>>> I would like to know how many brands are present in each store,
>>>
>>> I tried:
>>> result <- aggregate(MyData$brand , by=list(MyData$store) , nlevels)
>>>
>>> but I got:
>>> Group.1 x
>>> 1 , 4
>>> 2 , 4
>>> 3 , 4
>>>
>>> which is not exactly the result I expected
>>> I would like to get sthg like:
>>> Group.1 x
>>> 1 , 3
>>> 2 , 2
>>> 3 , 3
>>
>> Try:
>>
>> result <- aggregate(MyData$brand , by=list(MyData$store) , length)
>>
>> Quick, easy and generalizes to other situations. The factor levels got
>> carried along identically, but length counts the number of elements in the
>> list returned by tapply.
>
> Which may not have been what you asked for as this would demonstrate. You
> probably wnat the second solution:
> mydata2 <- rbind(MyData, MyData)
>> result <- aggregate(mydata2$brand , by=list(mydata2$store) , length)
>> result
>  Group.1 x
> 1       1 6
> 2       2 4
> 3       3 6
>
>> result <- aggregate(mydata2$brand , by=list(mydata2$store) , function(x)
>> nlevels(factor(x)))
>> result
>  Group.1 x
> 1       1 3
> 2       2 2
> 3       3 3
>
>>>
>>> Looking around, I found I can delete empty levels of factor using:
>>> problem.factor <- problem.factor[,drop=TRUE]
>>
>> If you reapply the function, factor, you get the same result. So you could
>> have done this:
>>
>> > result <- aggregate(MyData$brand , by=list(MyData$store) , function(x)
>> > nlevels(factor(x)))
>> > result
>>  Group.1 x
>> 1       1 3
>> 2       2 2
>> 3       3 3
>>
>>
>>
>>> But this solution isn't handy for me as I have many stores and should
>>> make a subset of my data for each store before dropping empty factor
>>>
>>> I can't either counting the line for each store (N), because the same
>>> brand can appear several times in each store (several products for the
>>> same brand, and/or several weeks of observation)
>>>
>>> I used to do this calculation using SAS with:
>>> proc freq data = MyData noprint ; by store ;
>>> tables  brand / out = result ;
>>> run ;
>>> (the cool thing was I got a database I can merge with MyData)
>>>
>>> any idea for doing that in R ?
>>>
>>> Thanks in advance,
>>>
>>> King Regards,
>>>
>>> Sylvain Willart,
>>> PhD Marketing,
>>> IAE Lille, France
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>