[R] Create a categorical variable from numeric column

Bert Gunter gunter.berton at gene.com
Sun Oct 6 17:54:19 CEST 2013


I think this is unwise. It depends on there being exactly 2 categories
in the desired result and silent coercion from logical to numeric, and
so does not generalize.  Sometimes brevity is **not** the soul of wit
(google if necessary).

I would suggest instead that cut specify three intervals and the final
condensation to 2 catgories be explicit. This can be done in many
ways, but, ifelse() is convenient here; e.g.

> x <- sample(1:24,10)
> x
 [1] 10  2 13  1 23 22  3 18 20  4

> y <- cut(x,bre=c(0,7,18,24),lab=FALSE)
## Note that the "include.lowest" and "right" arguments of cut() can
be invoked to handle endpoints as desired

> y
 [1] 2 1 2 1 3 3 1 2 3 1

> factor(ifelse(y ==2,2,1))
 [1] 2 1 2 1 1 1 1 2 1 1
Levels: 1 2

## This could all be condensed into a one-liner of course, but at the
cost of clarity.

Cheers,
Bert







On Sun, Oct 6, 2013 at 7:47 AM, arun <smartpink111 at yahoo.com> wrote:
>
> Thanks, ?cut() could be used in one line.
> Categ2<-(!is.na(cut(dat1[,1],breaks=c(7,17))))+1
>
>  identical(Categ,Categ2)
> #[1] TRUE
> A.K.
>
>
>
> ----- Original Message -----
> From: Bert Gunter <gunter.berton at gene.com>
> To: arun <smartpink111 at yahoo.com>
> Cc: R help <r-help at r-project.org>
> Sent: Sunday, October 6, 2013 10:18 AM
> Subject: Re: [R] Create a categorical variable from numeric column
>
> No.
>
> Use ?cut instead.
>
> -- Bert
>
>
> On Sun, Oct 6, 2013 at 6:29 AM, arun <smartpink111 at yahoo.com> wrote:
>>
>>
>>
>> Hi,
>>
>> I created 3 categories. If 1-7 and 18-24 should come under the same category, then:
>>  Categ<- findInterval(dat1$Col1,c(8,18))+1
>> Categ[Categ>2]<- 1
>> dat1$Categ<- Categ
>>  tail(dat1)
>> #   Col1       Col2 Categ
>> #45    2 -0.5419758     1
>> #46   21  1.1042719     1
>> #47   24 -1.0787079     1
>> #48   18  0.6253085     1
>> #49   15 -1.6822411     2
>> #50   16 -0.5966446     2
>>
>> A.K.
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: arun <smartpink111 at yahoo.com>
>> To: R help <r-help at r-project.org>
>> Cc:
>> Sent: Saturday, October 5, 2013 8:30 PM
>> Subject: Re: Create a categorical variable from numeric column
>>
>> Hi,
>> Try:
>> set.seed(29)
>> dat1<- data.frame(Col1=sample(1:24,50,replace=TRUE),Col2=rnorm(50))
>>  dat1$Categ <- findInterval(dat1$Col1,c(8,18))+1
>>   head(dat1)
>> #  Col1        Col2 Categ
>> #1    3 -0.09381378     1
>> #2    6 -0.83640257     1
>> #3    3  0.00307641     1
>> #4    8  0.04197496     2
>> #5   15  0.15433872     2
>> #6    3 -0.21301893     1
>>
>> split(dat1,dat1$Categ)
>>
>>
>> A.K.
>>
>>
>> I  have a data frame that contains a numerical variable ranging from 1 to 24. I would like to create a new category with two ranges: 1 to 7
>> and 18 to 24 will form one category and 8 to 17 will form another. How
>> can I create this category?
>>
>> Thanks
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> (650) 467-7374
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374



More information about the R-help mailing list