[R] Need a more efficient way to implement this type of logic in R

Alexander Engelhardt alex at chaotic-neutral.de
Wed Apr 6 23:04:12 CEST 2011


Am 06.04.2011 22:02, schrieb Walter Anderson:
> I have cobbled together the following logic. It works but is very slow.
> I'm sure that there must be a better r-specific way to implement this
> kind of thing, but have been unable to find/understand one. Any help
> would be appreciated.
>
> hh.sub <- households[c("HOUSEID","HHFAMINC")]
> for (indx in 1:length(hh.sub$HOUSEID)) {
> if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') |
> (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') |
> (hh.sub$HHFAMINC[indx] == '05'))
> hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000
> if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') |
> (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') |
> (hh.sub$HHFAMINC[indx] == '10'))
> hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000
> if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') |
> (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') |
> (hh.sub$HHFAMINC[indx] == '15'))
> hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000
> if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17'))
> hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000
> if ((hh.sub$HHFAMINC[indx] == '18'))
> hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000
> if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') |
> (hh.sub$HHFAMINC[indx] == '-9'))
> hh.sub$CS_FAMINC[indx] = 0
> }

Hi,
the for-loop is entirely unnecessary. You can, as a first step, rewrite 
the code like this:

if ((hh.sub$HHFAMINC == '01') | (hh.sub$HHFAMINC == '02') |
(hh.sub$HHFAMINC == '03') | (hh.sub$HHFAMINC == '04') |
(hh.sub$HHFAMINC == '05'))
     hh.sub$CS_FAMINC <- 1 # Less than $25,000

This very basic concept is called "vectorization" in R. You should read 
about it, it rocks.

In this case, though, you don't even need to do that:
If you cast the variable HHFAMINC into a number like this:
hh.sub$HHFAMINC <- as.numeric(hh.sub$HHFAMINC)
, then you can apply the cut() function to create a factor variable:

hh.sub$myawesomefactor <- cut(hh.sub$HHFAMINC, breaks=c(5.5, 10.5, 15.5, 
17.5))
or something like that should do the trick. You will then have to rename 
the factor values. I think it is the function names(), but I'm only 95% 
sure (heh.)

Also, this might be my OCD speaking, but I would use NA instead of 0 for 
non-available values.

Have fun,
  Alex



More information about the R-help mailing list