[R] How to create several variables from composite character variable

Marc Schwartz marc_schwartz at comcast.net
Fri Aug 17 23:48:44 CEST 2007


On Fri, 2007-08-17 at 14:40 -0700, Daniel Lakeland wrote:
> On Fri, Aug 17, 2007 at 05:32:54PM -0400, Dale Steele wrote:
> > I'm trying to create two variables (dka and newsonset) from the
> > following composite character variable diagnosis:
> > 
> > diagnosis <- c("hypoglycemia","diabetes" ,"newonset&dka", "newonset",
> > "diabetes", "dka&GI", "diabetes&GI", "newonset", "dka")
> > 
> > I can extract the indices for dka and newonset using the following....
> > 
> > > grep("dka", diagnosis)
> > [1] 3 6 9
> > > grep("newonset", diagnosis)
> > [1] 3 4 8
> > 
> > How do I create
> > 
> >  dka      = c(0,0,1,0,0,1,0,0,1)
> >  newonset = c(0,0,1,1,0,0,0,1,0)
> 
> dka <- sequence(0,0,length.out=NROW(diagnosis))
> dka[grep("dka",diagnosis)] <- 1
> 
> similar for newonset

Or alternatively:

> (regexpr("dka", diagnosis) > 0) * 1
[1] 0 0 1 0 0 1 0 0 1

> (regexpr("newonset", diagnosis) > 0) * 1
[1] 0 0 1 1 0 0 0 1 0

Unlike grep(), regexpr() will return a vector the same length as the
target vector.

HTH,

Marc Schwartz



More information about the R-help mailing list