[R] grouping

David Winsemius dwinsemius at comcast.net
Tue Apr 3 15:51:15 CEST 2012


On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:

> Use cut2 as I suggested and David demonstrated.

Agree that Hmisc::cut2 is extremely handy and I also like that fact  
that the closed ends of intervals are on the left side (which is not  
the same behavior as cut()), which has the otehr effect of setting  
include.lowest = TRUE which is not the default for cut() either (to my  
continued amazement).

But let me add the method I use when doing it "by hand":

cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)

-- 
David.


>
> Michael
>
> On Tue, Apr 3, 2012 at 9:31 AM, Val <valkremk at gmail.com> wrote:
>> Thank you all (David, Michael, Giovanni)  for your prompt response.
>>
>> First there was a typo error for the group mean it was 89.6 not 87.
>>
>> For a small data set and few groupings I can use  prob=c(0, .333, . 
>> 66 ,1) to
>> group in to three groups in this case. However,  if I want to  
>> extend the
>> number of groupings say 10 or 15 then do I have to figure it out the
>>   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>>
>> Is there a short cut for that?
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>> <michael.weylandt at gmail.com> wrote:
>>>
>>> Ignoring the fact your desired answers are wrong, I'd split the
>>> separating part and the group means parts into three steps:
>>>
>>> i) quantile() can help you get the split points,
>>> ii)  findInterval() can assign each y to a group
>>> iii) then ave() or tapply() will do group-wise means
>>>
>>> Something like:
>>>
>>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a  
>>> "c" here.
>>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
>>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>>
>>> You could also use cut2 from the Hmisc package to combine  
>>> findInterval
>>> and quantile into a single step.
>>>
>>> Depending on your desired output.
>>>
>>> Hope that helps,
>>> Michael
>>>
>>> On Tue, Apr 3, 2012 at 8:47 AM, Val <valkremk at gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> Assume that I have the following 10 data points.
>>>>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>>>>
>>>> sort x  and get the following
>>>>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>>>>
>>>> I want to  group the sorted  data point (y)  into  equal number of
>>>> observation per group. In this case there will be three groups.   
>>>> The
>>>> first
>>>> two groups  will have three observation  and the third will have  
>>>> four
>>>> observations
>>>>
>>>> group 1  = 34, 45, 46
>>>> group 2  = 66, 78, 125
>>>> group 3  = 193, 209, 242,297
>>>>
>>>> Finally I want to calculate the group mean
>>>>
>>>> group 1  =  42
>>>> group 2  =  87
>>>> group 3  =  234
>>>>
>>>> Can anyone help me out?
>>>>
>>>> In SAS I used to do it using proc rank.
>>>>
>>>> thanks in advance
>>>>
>>>> Val
>>>>
>>>>        [[alternative HTML version deleted]]
>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list