[R] coded to categorical variables in a large dataset

Chuck Cleland ccleland at optonline.net
Fri Dec 29 19:27:57 CET 2006


sj wrote:
> I am working with a dataset where there are 5 possible outcomes (coded 1:5),
> I would like to create 5 categorical variables (event1...event5). I am using
> a for loop an if statements, but I have a large dataset( approx 100,000
> rows) it takes quite a bit of time, is there a way to speed this up? Here is
> some sample code of what I am currently doing.

  Here is one way you might do it:

X <- sample(1:5, 100, replace=TRUE)

# Your 5 event variables in a matrix
model.matrix(lm(rnorm(length(X)) ~ as.factor(X) - 1))

  Also, along the lines of your approach below, the following using
ifelse() might be better:

event3 <- ifelse(test2 == 3, 1, 0)

  I'm sure other people will post different solutions probably more
elegant than these.

> test2 <-rep(seq(1:5),2000)
> 
> event1 <- rep(0,nrow(test2))
> event2 <- rep(0,nrow(test2))
> event3 <- rep(0,nrow(test2))
> event4 <- rep(0,nrow(test2))
> event5 <- rep(0,nrow(test2))
> 
> for(i in 1:length(event1))
> {
>     if (test2[i]==1)
>     {
>         event1[i]=1
>     }
> 
>     if (test2[i]==2)
>     {
>         event2[i]=1
>     }
> 
>     if (test2[i]==3)
>     {
>         event3[i]=1
>     }
> 
>     if (test2[i]==4)
>     {
>         event4[i]=1
>     }
> 
>     if (test2[i]==5)
>     {
>         event5[i]=1
>     }
> }
> 
> 
> 
> thanks,
> 
> Spencer
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894



More information about the R-help mailing list