[R] Creating New Variable Using Ifelse

Ismail SEZEN sezenismail at gmail.com
Thu Aug 10 07:55:57 CEST 2017


> On 10 Aug 2017, at 06:54, Courtney Benjamin <cbenjami at BTBOCES.ORG> wrote:
> 
> Hello R Help List,
> 
> I am an R novice and trying to use the ifelse function to create a new binary variable based off of the responses of two other binary variables; NAs are involved.  I pulled it off almost successfully, but when I checked the counts of my new variable for accuracy, I found that a small portion of the NA cases were not being passed through as NAs, but as "0" counts in my new variable.  My many attempts at creating a nested ifelse statement that would pass the NAs through properly have not been successful.  Any help is greatly appreciated.
> 
> Here is a MRE:?
> 
> library(RCurl)
> data <- getURL("https://raw.githubusercontent.com/cbenjamin1821/careertech-ed/master/elsq2wbl.csv")
> elsq2wbl <- read.csv(text = data)
> 
> ##Recoding Negative Responses to NA
> elsq2wbl [elsq2wbl[, "EVERRELJOB"] < -3, "EVERRELJOB"] <- NA
> elsq2wbl [elsq2wbl[, "PSWBL"] < -2, "PSWBL"] <- NA
> 
> #Labeling categorical variable levels
> elsq2wbl$EVERRELJOB <- factor(elsq2wbl$EVERRELJOB, levels = c(0,1), labels = c("No","Yes"))
> elsq2wbl$PSWBL <- factor(elsq2wbl$PSWBL, levels = c(0,1), labels = c("No","Yes"))
> 
> ##Trying to create a new variable to indicate if the student had a job
> #related to the college studies that was NOT a WBL experience
> elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" & elsq2wbl$EVERRELJOB=="Yes",1,0)
> 
> #Cross tab to check counts of two variables that new variable is based upon
> xtabs(~PSWBL+EVERRELJOB,subset(elsq2wbl,BYSCTRL==1&G10COHRT==1),addNA=TRUE)
> 
> #Checking count of newly created variable
> Q2sub <- subset(elsq2wbl,BYSCTRL==1&G10COHRT==1)
> library(plyr)
> count(Q2sub,'NONWBLRELJOB')
> 
> #The new variable has the correct count of "1", but 88 cases too many for "0"
> #The cross tab shows 20 and 68 NA cases that are being incorrectly counted as "0" in the new variable
> 
> #My other approach at trying to handle the NAs properly-returns an error
> elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" & elsq2wbl$EVERRELJOB=="Yes",1,ifelse(is.na(elsq2wbl$PSWBL)&is.na(elsq2wbl$EVERRELJOB),NA,
>                                                                                           ifelse(elsq2wbl$PSWBL!="No" & elsq2wbl$EVERRELJOB!="Yes",0)))
> 
> 
> 
> Courtney Benjamin

I could not follow the question up clearly. But one thing that come across to my sight is that you have values in elsq2wbl$EVERRELJOB as below:

summary(factor(elsq2wbl$EVERRELJOB))
  -9   -8   -7   -4   -3    0    1 
 139  459  946 2488 1948 4619 5598 

and in fact, you want to set negative values to NA.

> ##Recoding Negative Responses to NA
> elsq2wbl [elsq2wbl[, "EVERRELJOB"] < -3, "EVERRELJOB"] <- NA

But after the command, you still have 1948 ‘-3' in the variable;

summary(factor(elsq2wbl$EVERRELJOB))
  -3    0    1 NA's 
1948 4619 5598 4032 

So I think, you need to fix the line as follows:

> ##Recoding Negative Responses to NA
> elsq2wbl [elsq2wbl[, "EVERRELJOB"] <= -3, "EVERRELJOB"] <- NA


Instead of using ‘-2' and ‘-3' as threshold to set NA for different variables, why don’t you use “less than zero” condition as follows?

elsq2wbl [elsq2wbl[, "EVERRELJOB"] < 0, "EVERRELJOB"] <- NA
elsq2wbl [elsq2wbl[, "PSWBL"] < 0, "PSWBL"] <- NA

Hence, in both columns (variables), values lower than zero will be NA and you only will have 0, 1 and NA values in the variable as you called “binary”.

_ifelse_ part:

You have NA’s in both variables. In this circumstances, consider following ifelse samples (both sides of '&' can be exchanged)

ifelse(TRUE & TRUE, 1, 0) # 1
ifelse(TRUE & FALSE, 1, 0) # 0
ifelse(FALSE & FALSE, 1, 0) # 0
ifelse(TRUE & NA, 1, 0) # NA
ifelse(FALSE & NA, 1, 0) # 0

according to above, try to create new logic to achieve what you need.

In your last neste-ifelse, you forgot to define a value if deepest ifelse statement fails.

elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" & elsq2wbl$EVERRELJOB=="Yes", 1,
                                ifelse(is.na(elsq2wbl$PSWBL) & is.na(elsq2wbl$EVERRELJOB), NA,
                                       ifelse(elsq2wbl$PSWBL != "No" & elsq2wbl$EVERRELJOB != "Yes",0, "Forgotten value")))

Also, please, try to create a _minimal reproducible example_ instead of make us download a big csv file (219 columns x 16197 rows) and try to understand what you are trying to do. :)



More information about the R-help mailing list