[R] How to group by then count?
marc_schwartz at me.com
Tue Jan 6 22:54:24 CET 2015
> On Jan 6, 2015, at 3:29 PM, Monnand <monnand at gmail.com> wrote:
> Thank you, all! Your replies are very useful, especially Don's explanation!
> One complaint I have is: the function name (talbe) is really not very
Why not? You used the word 'table' in your original post, except as Don noted, you were overthinking the problem.
The basic concept is a tabulation of discrete values in a vector, which is a basic analytic method.
Using commands like:
would have led you to the table() function, as well as others.
Believe it or not, taking a few minutes to have read/searched "An Introduction to R", which is the basic R manual, would have led you to the same solution:
> On Sun Jan 04 2015 at 5:03:47 PM MacQueen, Don <macqueen1 at llnl.gov> wrote:
>> This seems to me to be a case where thinking in terms of computer
>> programming concepts is getting in the way a bit. Approach it as a data
>> analysis task; the S language (upon which R is based) is designed in part
>> for data analysis so there is a function that does most of the job for you.
>> (I changed your vector of strings to make the result more easily
>>> x = c("1", "1", "2", "1", "5", "2",'3','5','5','2','2')
>>> tmp <- table(x) ## counts the number of appearances of each element
>>> tmp[tmp==max(tmp)] ## finds which one occurs most often
>> Meaning that the element '2' appears 4 times. The table() function should
>> be fast even with long vectors. Here's an example with a vector of length
>> 1 million:
>> foo <- table( sample(letters, 1e6, replace=TRUE) )
>> One of the seminal books on the S language is John M Chambers' Programming
>> with Data -- and I would emphasize the "with Data" part of that title.
>> Don MacQueen
>> Lawrence Livermore National Laboratory
>> 7000 East Ave., L-627
>> Livermore, CA 94550
>> On 1/4/15, 1:02 AM, "Monnand" <monnand at gmail.com> wrote:
>>> Hi all,
>>> I thought this was a very naive problem but I have not found any solution
>>> which is idiomatic to R.
>>> The problem is like this:
>>> Assuming we have vector of strings:
>>> x = c("1", "1", "2", "1", "5", "2")
>>> We want to count number of appearance of each string. i.e. in vector x,
>>> string "1" appears 3 times; "2" appears twice and "5" appears once. Then I
>>> want to know which string is the majority. In this case, it is "1".
>>> For imperative languages like C, C++ Java and python, I would use a hash
>>> table to count each strings where keys are the strings and values are the
>>> number of appearance. For functional languages like clojure, there're
>>> higher order functions like group-by.
>>> However, for R, I can hardly find a good solution to this simple problem.
>>> found a hash package, which implements hash table. However, installing a
>>> package simple for a hash table is really annoying for me. I did find
>>> aggregate and other functions which operates on data frames. But in my
>>> case, it is a simple vector. Converting it to a data frame may be not
>>> desirable. (Or is it?)
>>> Could anyone suggest me an idiomatic way of doing such job in R? I would
>>> appreciate for your help!
More information about the R-help