[R] split variable / create categories

David Winsemius dwinsemius at comcast.net
Fri Sep 9 18:46:24 CEST 2011


On Sep 9, 2011, at 12:10 PM, Martin Batholdy wrote:

> Thanks for the suggestions!
>
> However all these functions don't produce exactly what I want
> (at least with my actual data).
>
>
> I need a split-algorithm that converts the values of my vectors into  
> four factors.
> And the crucial part is, that I need exactly the same number of  
> elements in each factor-level
> and no overlapping.
>
>
>
> cut() seems to find equal intervals – but that leads to different  
> numbers of elements in each interval.
>

You may want to look at 'cut2' with the 'g' argument in the Hmisc  
package. It's defaults are to include.lowest and it give the option of  
specifying equal sized groups. It fits with my assumptions about how a  
factor-ing variable _should_ be constructed, so apparently Harrell and  
I think alike.

(See code and output below. You displayed output from a numeric vector  
rather than a factor.)

-- 
David.
>
> library(lattice)
> equal.count(x,number=4,overlap=0)
>
> seems to do the job, but strangely enough, it seems to ignore the  
> argument 'overlap = 0' in my actual vector –
> I get factor-borders that overlap.
> And I really have to prevent this.
>
>
>
>
> On 09.09.2011, at 17:49, Andrea Spano wrote:
>
>> cut ( x , c(0, 1.4 ,6, 8, Inf ), labels = 1:4, include.lowest = T)
>>
>> On 9 September 2011 17:34, Martin Batholdy  
>> <batholdy at googlemail.com> wrote:
>> Hi,
>>
>> is there a function or an easy way to convert a variable with  
>> continuous values into a categorial variable (with x levels)?
>>
>> here is what I mean:
>>
>>
>> I want to transform x:
>>
>> x <- c(3.2,  1.5,  6.8,  6.9,  8.5,  9.6,  1.1,  0.6)
>>
>> into a 'categorial'-variable with four levels so that I get:
>>
>> [1] 2 2 3 3 4 4 1 1

 > as.numeric(cut2 (x <- c(3.2,  1.5,  6.8,  6.9,  8.5,  9.6,  1.1,   
0.6) ,g=4) )
[1] 2 2 3 3 4 4 1 1


>>
>> so each element is converted into its rank- value / categorial-value
>> (in this example four levels are created).
>>
>>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list