[R] Opinion: Why I find factors convenient to use

PIKAL Petr petr.pikal at precheza.cz
Mon Aug 20 15:10:07 CEST 2012


Hi

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Rui Barradas
> Sent: Monday, August 20, 2012 2:03 PM
> To: S Ellison
> Cc: r-help
> Subject: Re: [R] Opinion: Why I find factors convenient to use
> 
> Hello,
> 
> Em 20-08-2012 12:30, S Ellison escreveu:
> >
> >
> >> -----Original Message-----
> >> Over the years, many people -- including some who I would consider
> >> real expeRts -- have criticized factors and advocated the use
> >> (sometimes exclusively) of character vectors instead.
> > Exclusive use of character vectors is not going to do the job.
> >
> > The concept of a factor is fundamental to a lot of statistics; a
> programming environment that does not implement factors and their
> associated special behaviour is probably not a statistical programming
> language.
> >
> > Special behaviours I have in mind include:
> > - Level order can be arbitrarily specified for display purposes
> > - A control level can be intentionally chosen for contrasts
> > - the option of "ordered" factors (for example, for polr and the
> like)
> >
> > So I think the language does and will require a 'factor' type in one
> form or another.
> >
> >   _When_ you decide to convert a character input to a factor is, of
> course, up to the user,and for cleanup it's very often better to stick
> with character early and convert to factor a bit later. But personally,
> I think that there is sufficient control over the coding of data to
> allow user discretion. and on the whole, it seems to me that character
> input gets used as factor data so much of the time when it is used at
> all that the default stringsAsFactors=TRUE setting seems the more
> sensible default.
> 
> I disagree with this last point. Just think of the number of questions
> to this list about, say, dates. When read from file using one of the
> forms of read.table, they usually cause problems. Unless the user is an

Hm. I may be wrong but most confusion comes from:

My numbers are not read as numbers and when I try to convert them by as.numeric they are changed and scrambled to integers. What can I do?

Personally I do not find factors too much confusing, they behave almost the same as character vectors.

ch<-sample(letters[1:4], 20, replace=T)
ff<-factor(ch)

ch[ch=="b"]
[1] "b" "b" "b" "b" "b" "b" "b"
ff[ff=="b"]
[1] b b b b b b b
Levels: a b c d

paste(ch,1:5)
 [1] "b 1" "d 2" "d 3" "c 4" "d 5" "c 1" "b 2" "b 3" "c 4" "d 5" "b 1" "c 2"
[13] "b 3" "c 4" "b 5" "c 1" "c 2" "c 3" "b 4" "a 5"
paste(ff,1:5)
 [1] "b 1" "d 2" "d 3" "c 4" "d 5" "c 1" "b 2" "b 3" "c 4" "d 5" "b 1" "c 2"
[13] "b 3" "c 4" "b 5" "c 1" "c 2" "c 3" "b 4" "a 5"

ddch<-c("2000-05-05", "2001-05-05")
ddf<-as.factor(ddch)
str(as.Date(ddch))
 Date[1:2], format: "2000-05-05" "2001-05-05"
str(as.Date(ddf))
 Date[1:2], format: "2000-05-05" "2001-05-05"

The only problem is when you want to add some values to factors or to concatenate by c(some factor, some values), you need to do character conversion like that.

my.c <- function(x, ...) {
x.f <- as.character(x)
if (is.factor(x)) res <- as.factor(c(x.f, ...)) else res <- c(x,...)
res
} 

But e.g. merge works fine
ffx <- factor("x")

str(merge(data.frame(ff), data.frame(ffx), by.x="ff", by.y="ffx", all=T))
'data.frame':   21 obs. of  1 variable:
 $ ff: Factor w/ 5 levels "a","b","c","d",..: 1 1 2 2 2 2 2 2 3 3 ...

So for me personally default read.table stringsAsFactors=TRUE is better as I have some code working with factors and without checking. 


> experienced one, in which case he/she might not have a question to ask.
> Besides, the default TRUE is contradictory with "stick with character
> early and convert to factor a bit later". With both "early" and
> "later".
> A different thing is to have a very used function's default behavior
> change from one version of R to the next one. What about all the code
> in use? Maybe it's better to leave it be.
> 
> Rui Barradas
> >
> > S Ellison
> >
> > *******************************************************************
> > This email and any attachments are confidential. Any
> > use...{{dropped:8}}
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list