[R] cut2 once, bin twice...

Gabor Grothendieck ggrothendieck at gmail.com
Fri Oct 23 17:14:25 CEST 2009


On Fri, Oct 23, 2009 at 3:58 AM, Dieter Menne
<dieter.menne at menne-biomed.de> wrote:
>
>
>
> sdanzige wrote:
>>
>>
>> I'm using the Hmisc cut2 function to bin a set of data.  It produces bins
>> that I like with results like this:
>>
>> [96,270]:171
>> [69, 96): 54
>> [49, 69): 40
>> [35, 49): 28
>> [28, 35): 14
>> [24, 28):  8
>> (Other) : 48
>>
>> I would like to take a second set of data, and assign it to bins based on
>> factors defined by my call to cut 2.
>>
>
> It used to be quite tricky, but on popular request Brian Ripley has added an
> example how to extract the intervals using regular expression on the bottom
> of the examples for cut (note:cut in base, not cut2 in Hmisc).
>
> If someone knows of an easier way, please correct me. How about adding this
> information as attribute to the standard cut?
>

The strapply function in gsubfn can do it with a simpler regular
expression since it extracts based on content rather than delimiters,
which is what you want here:

> # create sample data
> library(gsubfn)
> set.seed(1)
> dat <- seq(4, 7, by = 0.05)
> x <- sample(dat, 30)
.
> # use cut
> groups <- cut(x, breaks = 10)

> # extract interval boundaries using strapply
> strapply(levels(groups), "[[:digit:].]+", as.numeric, simplify = TRUE)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]  4.0  4.3  4.6  4.9  5.2  5.5  5.8  6.1  6.4   6.7
[2,]  4.3  4.6  4.9  5.2  5.5  5.8  6.1  6.4  6.7   7.0

The above is from

   demo("gsubfn-cut")

For more see the gsubfn home page at http://gsubfn.googlecode.com




More information about the R-help mailing list