[R] Assign factor and levels inside function

Liaw, Andy andy_liaw at merck.com
Fri Apr 22 03:57:22 CEST 2005


Tim,

> From: Tim Howard 
> 
> Andy, 
>   Thank you for the help. Yes, my question really did seem like I was
> going through a lot of unnecessary steps just to define levels of a
> variable. But that was just for the example. In my 
> application, I bring
> new datasets into R on a daily basis. While the data differs, the
> variables are the same, and the categorical variables have the same
> levels. So I find myself daily applying the same factor and level
> definitions (by cutting and pasting the large chunk of commands from a
> text file). It really would be simpler to have it wrapped up in a
> function.  That's why I asked the question about putting this into a
> function.
>   Upon reading your answer, I thought maybe I could use your example
> and use the super-assignment '<<-' in the function. But, your method
> assigns levels, but does not define the var as a factor 
> (interesting!).
> 
> >  levels(y$one) <- seq(1, 9, by=2)
> > y$one
> [1] 1 1 3 3 5 7
> attr(,"levels")
> [1] 1 3 5 7 9
> > is.factor(y$one)
> [1] FALSE

Ouch!  "levels<-" is generic, and the default method simply attach the
levels attribute to the object.  You need to coerce the object into a factor
explicitly.

> Unfortunately, whenever I try to use <<- with the dataframe as the
> variable, I get an error message: 
> 
> > fncFact <- function(datfra){
> + datfra$one <<- factor(datfra$one, levels=c(1,3,5,7,9))
> + }
> > fncFact(y)
> Error in fncFact(y) : Object "datfra" not found

I believe the canonical ways of doing something like this in R is something
along the line of:

processData <- function(dat) {
    dat$f1 <- factor(dat$f1, levels=...)
    ...  ## any other manipulations you want to do
    dat
}

Then when you get new data, you just do:

newData <- processData(newData)

HTH,
Andy

> 
> Tim
> 
> >>> "Liaw, Andy" <andy_liaw at merck.com> 4/20/2005 4:03:24 PM >>>
> Wouldn't it be easier to do this?
> 
> > levels(y$one) <- seq(1, 9, by=2)
> > y$one
> [1] 1 1 3 3 5 7
> attr(,"levels")
> [1] 1 3 5 7 9
> 
> Andy
> 
> > From: Tim Howard
> > 
> > R-help,
> >   After cogitating for a while, I finally figured out how to define
> a
> > data.frame column as factor and assign the levels within a
> function...
> > BUT I still need to pass the data.frame and its name 
> > separately. I can't
> > seem to find any other way to pass the name of the data.frame,
> rather
> > than the data.frame itself.  Any suggestions on how to go 
> > about it?  Is
> > there something like value(object) or name(object) that I can't
> find?
> > 
> > #sample dataframe for this example
> > y <- data.frame(
> >  one=c(1,1,3,3,5,7),
> >  two=c(2,2,6,6,8,8))
> > 
> > > levels(y$one)   # check out levels
> > NULL
> > 
> > # the function I've come up with
> > fncFact <- function(datfra, datfraNm){
> > datfra$one <- factor(datfra$one, levels=c(1,3,5,7,9))
> > assign(datfraNm, datfra, pos=1)
> > }
> > 
> > >fncFact(y, "y")
> > > levels(y$one)
> > [1] "1" "3" "5" "7" "9"
> > 
> > I suppose only for aesthetics and simplicity, I'd like to have only
> > pass the data.frame and get the same result.
> > Thanks in advance,
> > Tim Howard
> > 
> > 
> > > version
> >          _              
> > platform i386-pc-mingw32
> > arch     i386           
> > os       mingw32        
> > system   i386, mingw32  
> > status                  
> > major    2              
> > minor    0.1            
> > year     2004           
> > month    11             
> > day      15             
> > language R
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html 
> > 
> > 
> > 
> 
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachment...{{dropped}}




More information about the R-help mailing list