[R] assign factor levels based on list

William Dunlap wdunlap at tibco.com
Wed Feb 9 22:41:58 CET 2011


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tim Howard
> Sent: Wednesday, February 09, 2011 12:44 PM
> To: r-help at r-project.org
> Subject: [R] assign factor levels based on list
> 
> All,
>  
> Given a data frame and a list containing factor definitions 
> for certain columns, how can I apply those definitions from 
> the list, rather than doing it the standard way, as noted 
> below. I'm lost in the world of do.call, assign, paste, and 
> can't find my way through. For example:
>  
> #set up df
> y <- data.frame(colOne = c(1,2,3), colTwo = 
> c("apple","pear","orange"))
>  
> factor.defs <- list(colOne = list(name = "colOne",
>  lvl = c(1,2,3,4,5,6)),
>  colTwo = list(name = "colTwo",
>  lvl = c("apple","pear","orange","fig","banana")))

Why not the following format?
   my.factor.defs <- list(colOne = c(1,2,3,4,5,6),
                       colTwo = c("apple", "pear", "orange", "fig",
"banana"))
Do you really want to support a case like the following?
   list(colOne = list( name = "anotherColumn", lvl=c(1,2,3,4,5,6))
  
> #A standard way to define levels
> y$colTwo <- factor(y$colTwo , levels = 
> c("apple","pear","orange","fig","banana"))
>  
> # I'd like to use the definitions locally but also pass them 
> (but not the data) to a function, 
> # so, rather than defining each manually each time, I'd like 
> to loop through the columns,
> # call them by name, find the definitions in the list and use 
> them from there. Before I try to loop
> # or use some form of apply, I'd like to get a single factor 
> definition working.

First write a function that takes a data.frame and list
of desired levels for each column and outputs a new data.frame.
E.g., if you use the simpler form of the levelsList I gave
above, the following might work well enough (it does no
error checking):
   assignNewLevelsToDataFrameColumns <- function(x, levelsList) {
      for(colName in names(levelsList)) {
          # note that x$name is equivalent to x[["name"]], so
          # if you want to use a variable as the name, use [[.
          x[[colName]] <- factor(x[[colName]],
levels=levelsList[[colName]])
      }
      x
   }
Test it:
   > fixedY <- assignNewLevelsToDataFrameColumns(y, my.factor.defs)
     colOne colTwo
   1      1  apple
   2      2   pear
   3      3 orange
   > str(fixedY)
   'data.frame':   3 obs. of  2 variables:
    $ colOne: Factor w/ 6 levels "1","2","3","4",..: 1 2 3
    $ colTwo: Factor w/ 5 levels "apple","pear",..: 1 2 3
Do
   > y <- assignNewLevelsToDataFrameColumns(y, my.factor.defs)
if you want to overwrite the old y.

Now if you want a function that changes the data.frame you give
it, use a replacement function.  If you want to use the syntax
   > func(y) <- newStuff
then the function should be called `func<-` and the last argument
must be called 'value' (newStuff will be passed via value=newStuff).
E.g.,
   `func<-` <- function(x, value) {
         alteredX <- assignNewLevelsToDataFrameColumns(x, value)
         alteredX
    }
and use it as
   > func(y) <- my.factor.defs
   > str(y)
   'data.frame':   3 obs. of  2 variables:
   $ colOne: Factor w/ 6 levels "1","2","3","4",..: 1 2 3
   $ colTwo: Factor w/ 5 levels "apple","pear",..: 1 2 3
The first command gets translated into
   y <- `func<-`(y, value=my.factor.defs)

If you write a replacement function, it is nice to create a matching
extractor function called 'func'.  E.g.,
   > func <- function(x) lapply(x, levels)
   > func(y)
   $colOne
   [1] "1" "2" "3" "4" "5" "6"
   
   $colTwo
   [1] "apple"  "pear"   "orange" "fig"    "banana"

Note that this avoids assign(), get(), eval(), etc., and
thus makes it easy to follow the flow of data in the code: only
things on the left side of the assignment arrow can get
changed.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

>  
> # this doesn't seem to see the dataframe properly
> do.call(factor,list((paste("y$",factor.defs[2][[1]]$name,sep="
")),levels=factor.defs[2][[1]]$lvl))
>  
> #adding "as.name" doesn't help
> do.call(factor,list(as.name(paste("y$",factor.defs[2][[1]]$nam
e,sep="")),levels=factor.defs[2][[1]]$lvl))
>  
> #Here's my attempt to mimic the standard way, using assign. 
> Ha! what a joke.
> assign(as.name(paste("y$",factor.defs[2][[1]]$name,sep="")),
>     do.call(factor, 
> list(as.name(paste("y$",factor.defs[2][[1]]$name,sep="")), 
>     levels = factor.defs[2][[1]]$lvl)))
> ##Error in function (x = character(), levels, labels = 
> levels, exclude = NA,  : 
> ##  object 'y$colTwo' not found
> Any help or perspective (or better way from the beginning!) 
> would be greatly appreciated. 
> Thanks in advance!
> Tim
>  
>  
>  
>  
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list