[R] Binning question (binning rows of a data.frame according to a variable)

Mon Mar 20 11:55:54 CET 2006

Gabor Grothendieck wrote:
> On 3/19/06, Dan Bolser <dmb at mrc-dunn.cam.ac.uk> wrote:
> 
>>Gabor Grothendieck wrote:
>>
>>>On 3/18/06, Dan Bolser <dmb at mrc-dunn.cam.ac.uk> wrote:
>>>
>>>
>>>>Gabor Grothendieck wrote:
>>>>
>>>>
>>>>>If you are just looking for something simple that may be good enough
>>>>>then assign the largest one to group 1, the second largest to group 2,
>>>>>..., the 8th largest to group 8 and then start over again with group 1
>>>>>and so on.
>>>>>
>>>>># test data
>>>>>set.seed(1)
>>>>>x <- sample(100, 100, rep = TRUE)
>>>>>
>>>>>xs <- sort(x)
>>>>>g <- gl(8, 1, length(xs)) # 8 groups
>>>>>
>>>>># so that g contains the groups that correspond to xs.
>>>>>
>>>>>tapply(xs, g, sum)   # 659 671 687 701 612 622 629 646
>>>>>
>>>>
>>>>
>>>>That is a fairly neat way of getting groups with a good 'approximate
>>>>same size', however, in general I would like to be able to order my data
>>>>in any way, and still cut it into equal 'size' groups (like quantiles
>>>>for rows, but for row variable totals instead).
>>>
>>>
>>>Do you mean you want g to be in the original order of x?
>>
>>No. What I mean is that I want to order x by any particular variable in
>>my data.frame, then group over x such that each group has roughly the
>>same sum.
>>
>>I get the feeling I have missed a very simple trick.
> 
> 
> 
> Suggest providing a short self contained reproducible example including
> input and desired output and a detailed explanation.

Does my subsequent post answer your question in this regard? Seems as 
though 'optimality' is not possible with any reasonable approach, 
however, the ordering criteria may mean we can get optimal solutions.

Dan.