[R] cut into groups of equal nr of elements...

Wet Bell Diver wetbelldiver at gmail.com
Thu Jul 18 18:35:08 CEST 2013


Here's one way:

Vec <- rnorm(30)
Vec.cut <- cut(Vec, breaks=c(quantile(Vec, probs = seq(0, 1, by = 0.20))),
     labels=c("0-20","20-40","40-60","60-80","80-100"), include.lowest=TRUE)
table(Vec.cut)


or determine the breaks automatically:

cut.size <- function(x, size) {
   cut.prob <- size/length(x)
   if (length(x)%%size != 0) warning("Equal sized groups only possible 
by dropping some elements from x")
   Vec.cut <- cut(x, breaks=c(quantile(x, probs = seq(0, 1, by = 
size/length(x)))), include.lowest=TRUE)
}
CUT <- cut.size(Vec, 6)
table(CUT)

When asking for
cut.size(Vec, 7)
this will yield 4 equal-sized groups of 7, because there is no way to 
perfectly split 30 observations in groups of 7 each.

HTH,
Peter


Op 18-7-2013 18:09, Marc Schwartz schreef:
> Greg,
>
> Good catch. My recollection was that the vector would be broken up into 'breaks' groups of equal size, however it is range(x) that is split into 'breaks' intervals, each of which is equal width.
>
> Thanks,
>
> Marc
>
>
> On Jul 18, 2013, at 10:55 AM, Greg Snow <538280 at gmail.com> wrote:
>
>> Marc,
>>
>> Your method works fine when the data is perfectly uniform, but try it with "Vec <- rnorm(30)" and you will see that there are more observations in the middle groups and fewer in the tail groups.  Something like quantile needs to be used to find the unequally spaced breaks that will give equal counts within groups.
>>
>>
>> On Wed, Jul 17, 2013 at 5:04 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
>> On Jul 17, 2013, at 4:43 PM, Witold E Wolski <wewolski at gmail.com> wrote:
>>
>>> I would like to "cut" a vector into groups of equal nr of elements.
>>> looking for a function on the lines of cut but where I can specify
>>> the size of the groups instead of the nr of groups.
>>
>>
>> In addition to the other options, if the 'breaks' argument to cut() is a single number, rather than a vector of cut points, it defines the number of intervals to break the 'x' vector into, which of course you can derive from length(x) / size.
>>
>> Thus:
>>
>> set.seed(1)
>> Vec <- sample(30)
>>
>>> Vec
>>   [1]  8 11 17 25  6 23 27 16 14  2  5  4 13  7 18 30 29 24 20  9 10 21
>> [23] 26  1 22 15 28 12  3 19
>>
>>
>> # Split into 5 groups of 6 each
>>
>>> split(Vec, cut(Vec, 5))
>> $`(0.971,6.78]`
>> [1] 6 2 5 4 1 3
>>
>> $`(6.78,12.6]`
>> [1]  8 11  7  9 10 12
>>
>> $`(12.6,18.4]`
>> [1] 17 16 14 13 18 15
>>
>> $`(18.4,24.2]`
>> [1] 23 24 20 21 22 19
>>
>> $`(24.2,30]`
>> [1] 25 27 30 29 26 28
>>
>>
>> Regards,
>>
>> Marc Schwartz
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> -- 
>> Gregory (Greg) L. Snow Ph.D.
>> 538280 at gmail.com
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list