[R] How to group by then count?

Tue Jan 6 22:54:24 CET 2015

> On Jan 6, 2015, at 3:29 PM, Monnand <monnand at gmail.com> wrote:
> 
> Thank you, all! Your replies are very useful, especially Don's explanation!
> 
> One complaint I have is: the function name (talbe) is really not very
> informative.

Why not? You used the word 'table' in your original post, except as Don noted, you were overthinking the problem.

The basic concept is a tabulation of discrete values in a vector, which is a basic analytic method.

Using commands like:

  ??table
  ??frequency

would have led you to the table() function, as well as others.

Believe it or not, taking a few minutes to have read/searched "An Introduction to R", which is the basic R manual, would have led you to the same solution:

  http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Frequency-tables-from-factors

Regards,

Marc Schwartz

> 
> On Sun Jan 04 2015 at 5:03:47 PM MacQueen, Don <macqueen1 at llnl.gov> wrote:
> 
>> This seems to me to be a case where thinking in terms of computer
>> programming concepts is getting in the way a bit. Approach it as a data
>> analysis task; the S language (upon which R is based) is designed in part
>> for data analysis so there is a function that does most of the job for you.
>> 
>> (I changed your vector of strings to make the result more easily
>> interpreted)
>> 
>>> x = c("1", "1", "2", "1", "5", "2",'3','5','5','2','2')
>>> tmp <- table(x)      ## counts the number of appearances of each element
>>> tmp[tmp==max(tmp)]   ## finds which one occurs most often
>> 2
>> 4
>> 
>> Meaning that the element '2' appears 4 times.  The table() function should
>> be fast even with long vectors. Here's an example with a vector of length
>> 1 million:
>> 
>> foo <- table( sample(letters, 1e6, replace=TRUE) )
>> 
>> 
>> One of the seminal books on the S language is John M Chambers' Programming
>> with Data -- and I would emphasize the "with Data" part of that title.
>> 
>> --
>> 
>> Don MacQueen
>> 
>> Lawrence Livermore National Laboratory
>> 7000 East Ave., L-627
>> Livermore, CA 94550
>> 925-423-1062
>> 
>> 
>> 
>> 
>> 
>> On 1/4/15, 1:02 AM, "Monnand" <monnand at gmail.com> wrote:
>> 
>>> Hi all,
>>> 
>>> I thought this was a very naive problem but I have not found any solution
>>> which is idiomatic to R.
>>> 
>>> The problem is like this:
>>> 
>>> Assuming we have vector of strings:
>>> x = c("1", "1", "2", "1", "5", "2")
>>> 
>>> We want to count number of appearance of each string. i.e. in vector x,
>>> string "1" appears 3 times; "2" appears twice and "5" appears once. Then I
>>> want to know which string is the majority. In this case, it is "1".
>>> 
>>> For imperative languages like C, C++ Java and python, I would use a hash
>>> table to count each strings where keys are the strings and values are the
>>> number of appearance. For functional languages like clojure, there're
>>> higher order functions like group-by.
>>> 
>>> However, for R, I can hardly find a good solution to this simple problem.
>>> I
>>> found a hash package, which implements hash table. However, installing a
>>> package simple for a hash table is really annoying for me. I did find
>>> aggregate and other functions which operates on data frames. But in my
>>> case, it is a simple vector. Converting it to a data frame may be not
>>> desirable. (Or is it?)
>>> 
>>> Could anyone suggest me an idiomatic way of doing such job in R? I would
>>> be
>>> appreciate for your help!
>>> 
>>> -Monnand