[R] create multiple categorical variables in a data frame using a loop

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Thu Apr 19 22:22:28 CEST 2018


> On Apr 19, 2018, at 11:20 AM, Ding, Yuan Chun <ycding using coh.org> wrote:
> 
> Hi All,
> 
> I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code.
> 
> pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA
>  cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)]<-0
>  cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-2
>  cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)
>           &pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-1
>  }

This would be somewhat more compact and easier to maintain if you used findInterval (untested in the absence of a data object, which is your responsibility):

pfas.pheno <-within(pfas.pheno, {
 cat.pfoa  <- findInterval( log2pfoa , c(-Inf, quantile( log2pfoa,c(.25,.75), Inf), na.rm =T), Inf)]-1 } )
 

`findInterval` numbers its intervals from 1, so to get a sequence starting at 0 just subtract 1.


> However, I have additional 7 similar variables, so I wrote the following code, but it does not work.
> 
> for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea",   "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh"))  {
> cat.var <- paste0("cat.",i)
> pfas.pheno <- within(pfas.pheno, {eval(parse(text= cat.var))<-NA

Nope. Cannot use R like a macro processor, at least not easily. R names are not the same as character vlaues. They "live in different realities". The `get` and `assign` functions can be used to "promote" character values to real R names and make assignments from and to what would otherwise be merely character values.

Perhaps this (also mostly untested (except for the strategy of making `assign` creat a new dataframe column:

 for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea",   "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh",  
             "log2me_pfosa_acoh"))  {
  cat.var <- paste0("cat.",i)
  assign( cat.var, findInterval( get(i) , c(-Inf, quantile( get(i), c(.25,.75), Inf), na.rm =T), Inf)]-1 } ),  
                   envir=as.environment( get( pfas.pheno ) ) )

Best;
David.



> eval(parse(text=cat.var))[pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.25, na.rm =T)] <- 0
> eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.75, na.rm =T)] <- 2
> eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.25, na.rm =T)
>                                  &pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.75, na.rm =T)] < -1
> })
>                                                                                  }
> 
> Can you help me fix the problem?
> 
> Thank you,
> 
> Yuan Chun Ding
> City of Hope National Medical Center
> 
> 
> 
> ---------------------------------------------------------------------
> -SECURITY/CONFIDENTIALITY WARNING-
> This message (and any attachments) are intended solely...{{dropped:20}}




More information about the R-help mailing list