[Rd] Regression stars

Brian Lee Yung Rowe rowe at muxspace.com
Tue Feb 12 17:05:55 CET 2013


I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings.

> fs <- c('apple','peach','watermelon','spinach','persimmon','potato','kale')
> n <- 1000000
>
> a1 <- data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=TRUE)
> a2 <- data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=FALSE)
>
> fn <- function(i,x) x[x$f %in% c('kale','spinach'),]
> system.time(z <- sapply(1:100, fn, a1))
   user  system elapsed 
 19.614   4.037  24.649 
> system.time(z <- sapply(1:100, fn, a2))
   user  system elapsed 
 19.726   7.715  36.761 


On Feb 12, 2013, at 10:40 AM, Ben Bolker <bbolker at gmail.com> wrote:
> 
>  Thanks, Uwe.
>  Now let me go one step farther.
> 
>  Can you (or anyone) give a good argument **other than backward
> compatibility** for keeping the stringAsFactors=TRUE argument on
> data.frame()?
> 
>  I appreciate your distinction between data.frame() and read.table()'s
> use of stringAsFactors, and I can see that there is some point for
> quick-and-dirty interactive use in setting all non-numeric variables to
> factors (arguing that wanting non-numerics as factors is somewhat more
> common than wanting them as strings).
> 
>  It might be nice to add an optional stringsAsFactors (and check.names)
> argument to transform(): I've had to write my own Transform() function
> to allow the defaults to be overridden, since transform() calls
> data.frame() with the defaults.  (Setting the stringsAsFactors option
> globally would work, although not for check.names.)
> 
>  Ben BOlker
> 
>> 
>>> 
>>>> What I will likely do is
>>>> make a few changes so that character vectors are automatically changed
>>>> to factors in modelling functions, so that operating with
>>>> stringsAsFactors=FALSE doesn't trigger silly warnings.
>>>> 
>>>> Duncan Murdoch
>>>> 
>>> 
>>>  [apologies for snipping context: "gmane made me do it"]
>>> 
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list