[R] grouping

David Winsemius dwinsemius at comcast.net
Tue Apr 3 15:10:53 CEST 2012


On Apr 3, 2012, at 8:47 AM, Val wrote:

> Hi all,
>
> Assume that I have the following 10 data points.
> x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

The methods below do not require a sorting step.

>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The  
> first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234

I hope those weren't answers from SAS.

>
> Can anyone help me out?
>

I usually do this with Hmisc::cut2 since it has a `g = <n>` parameter  
that auto-magically calls the quantile splitting criterion but this is  
done in base R.

split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) )
$`[36,65.9]`
[1] 36 45 46

$`(65.9,189]`
[1]  66  78 125

$`(189,297]`
[1] 193 209 242 297


 > lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) ), mean)
$`[36,65.9]`
[1] 42.33333

$`(65.9,189]`
[1] 89.66667

$`(189,297]`
[1] 235.25

Or to get a table instead of a list:
 > tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) , mean)
  [36,65.9] (65.9,189]  (189,297]
   42.33333   89.66667  235.25000

> In SAS I used to do it using proc rank.

?quantile isn't equivalent to  Proc Rank but it will provide a useful  
basis for splitting or tabling functions.

>
> thanks in advance
>
> Val
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list