[R] recode categorial vars into binary data
Rui Barradas
ruipbarradas at sapo.pt
Tue May 7 18:51:20 CEST 2013
Hello,
First of all, you don't need as.data.frame(cbind(...)). It's much better
to simply do data.frame(...).
As for the conversion, the following function doesn't use randomness but
gets the job done
df <- data.frame(snr=c(1,2,3,4,5,6,7,8,9,10),
k1=c(1,1,4,2,3,2,2,5,2,2),
k2=c(1,2,3,2,1,2,1,3,3,2),
result=c(4,3,5,4,2,6,4,4,2,3))
fun <- function(x){
n <- length(x)
y <- rep(NA, n)
y[x < median(x)] <- 0
y[x > median(x)] <- 1
w <- which(x == median(x))
y[w[seq_len(n/2 - length(which(x < median(x))))]] <- 0
y[is.na(y)] <- 1
y
}
fun(df$k1)
fun(df$k2)
Hope this helps,
Rui Barradas
Em 07-05-2013 17:20, D. Alain escreveu:
> Dear R-List,
>
> I would like to recode categorial variables into binary data, so that all values above median are coded 1 and all values below 0, separating each var into two equally large groups (e.g. good performers = 0 vs. bad performers =1).
>
> I have not succeeded so far in finding a nice solution to do that in R. I thought there might be a better way than ordering each column and recoding the first 50% into 0 and the second into 1. If I use ifelse I have a problem with cases that share the same rank being all median.
>
> e.g.
> df<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(1,1,4,2,3,2,2,5,2,2),k2=c(1,2,3,2,1,2,1,3,3,2),result=c(4,3,5,4,2,6,4,4,2,3)))
>
> now I want to recode k1 and k2 so that I have half of the values recoded 0 and half recoded 1, split around the median point. The median of k1 is 2 which would lead to unequal groupsize if used 2 as cutoff, so all values k1=2 should be recoded 1 or 0 randomly until both categories have the same length.
>
> something like
>
> df.rec<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(0,0,1,0,1,1,0,1,0,1),k2=c(0,1,1,0,0,1,0,1,1,0),result=c(4,3,5,4,2,6,4,4,2,3)))
>
> Can anyone help?
>
> Thank you in advance.
>
> Best wishes.
> Alain
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list