[R] If statement - copying a factor variable to a new variable

Miguel Manese jjonphl at gmail.com
Thu Jun 28 16:50:33 CEST 2012


On Thu, Jun 28, 2012 at 8:47 PM, James Holland <holland.aggie at gmail.com> wrote:
> With the multiple if statements I need to check for, I though for statements
> with the if/else if conditional statement was better than nested ifelse
> functions.

for () gives you a lot of flexibility at the expense of being verbose
& slow, ifelse() is a bit limited but you get conciseness (== more
elegant, IMO) and intuitively should be faster since it is vectorized

> For example
>
> #example expanded on
>
> rm(list=ls())
>
> v1.factor <- c("S","S","D","D","D",NA)
> v2.factor <- c("D","D","S","S","S","S")
> v3 <- c(1,0,0,0,0,NA)
> v4 <- c(0,0,1,1,0,0)
>
> test.data <- data.frame(v1.factor,v2.factor, v3, v4)

Technically since you will pick a value from one of v1.factor,
v2.factor, v3, v4 into a new vector, they should have the same type
(e.g. numeric, character, integer). So I'll assume

v3 <- c("S","D","D","D","D",NA)
v4 <- c("D","D","S","S","D","D")

If you prefer vectorizing, you can create an index

# btw, is.na(v1.factor) is already logical (boolean),
# is.na(v1.factor)==TRUE is redundant
cond1 <- is.na(v1.factor) & is.na(v2.factor)
cond2 <- is.na(v1.factor) & ! is.na(v2.factor)
...

# cond1, cond2, etc should be mutually exclusive for this to work,
# i.e. for each row, one and only one of cond1, cond2, cond3 is TRUE
# not the case in your example, but you can make it so like
# cond2 <- !cond1 & (is.na(v1.factor) & !is.na(v2.factor))
# cond3 <- !cond1 & !cond2 & (...)
idx <- c(cond1, cond2, cond3, ...)

# to make it intuitive, you can convert idx into a matrix
# i.e. test.data[idx] will return elements of test.data corresponding
to elements of
# matrix idx which is TRUE
# this is actually optional,  R stores matrices in column-major order
idx <- matrix(idx, nrow=length(cond1))

cbind(NA, test.data)[idx]    # because your first condition should return NA!

Or you can use sapply(), which in essence is similar to for-loop().

>
> I'm not familiar with ifelse, but is there a way to use it in a nested
> format that would be better than my for loop structure?  Or might I be
> better off finding a programming way of converting the new factor variables
> "back" to their factor values using the levels function?

I don't understand your second question, but when combining factors it
is better to  deal with their "labels" (i.e. as.character(my.factor))
then convert the vector of strings to a factor (i.e.
as.factor(my.result)). Internally a factor is a vector of
(non-negative) integers, and levels(v1.factor) shows the mapping of
these integers to its "label." So you'll have a problem e.g. if the
two factor vectors map the integer 1 to different "labels."

Regards,

Jon



More information about the R-help mailing list