[R] Opinion: Why I find factors convenient to use

Petr Savicky savicky at cs.cas.cz
Sat Aug 18 09:48:26 CEST 2012


On Fri, Aug 17, 2012 at 07:34:35PM +0100, Rui Barradas wrote:
> Hello,
> 
> No, factors may use less memory. System dependent?
> 
> > x <-sample(c("small","medium","large"),1e4,rep=TRUE)
> > y <- factor(x)
> > object.size(x)
> 80184 bytes
> > object.size(y)
> 40576 bytes
> >
> > sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
> 
> locale:
> [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
> [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
> [5] LC_TIME=Portuguese_Portugal.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods base
> 
> other attached packages:
> [1] Rcapture_1.2-0 xts_0.8-0      zoo_1.7-7
> 
> loaded via a namespace (and not attached):
> [1] chron_2.3-39   fortunes_1.4-2 grid_2.15.1    lattice_0.20-6 tools_2.15.1
> 
> 
> And I agree with what Steve said, stringsAsFactors = FALSE saves hours 
> of debuging time.

Hi.

I use stringsAsFactors = FALSE quite frequently. If there is a discussion
on R-devel, whether this should be the default, i would support this.

Factors are very useful and sometimes necessary, but they are hard to manipulate.
As Jeff Newmiller said, it is a good strategy to prepare the data as character
type and convert to a factor, when they are complete. The users should know, how
to use factors, however the strategy "convert to a factor eventually" is
more consistent with not having stringsAsFactors = TRUE as the default.

Petr Savicky.




More information about the R-help mailing list