[Rd] Characters vs. factors

David M Smith david at revolution-computing.com
Tue Oct 6 04:46:43 CEST 2009


On Mon, Oct 5, 2009 at 4:33 PM, hadley wickham <h.wickham at gmail.com> wrote:
> It seems like a recent trend in R has been to make character vectors
> and factors almost equivalent (apart from the way that factors always
> remember their original range).  There are a few exceptions:

A related issue is that modeling functions throw a warning when
character objects are used in place of factors:

> shopping <- read.csv("http://spreadsheets.google.com/pub?key=tE9pXlYLwTAeiDWxL8h_viA&single=true&gid=0&range=A1%3AE37&output=csv", as.is=TRUE)
> shopping$seconds <- as.numeric(as.difftime(shopping$Total.Time))
> fit <- lm(seconds ~ Number.of.Items + Payment - 1, shopping,subset=-8)
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  variable 'Payment' converted to a factor

The warning doesn't affect R's behaviour, of course, but it does make
it difficult to sanction the otherwise sensible advice to R beginners
to read in data files with as.it=TRUE. (The warning leads to
difficult-to-answer questions.) For similar reasons  I deleted the
warning from this post:
http://blog.revolution-computing.com/2009/09/is-the-express-line-really-faster-1.html

In general the trend towards equivalence of factors and character
vectors is welcome, though.

# David

On Mon, Oct 5, 2009 at 4:33 PM, hadley wickham <h.wickham at gmail.com> wrote:
>
> It seems like a recent trend in R has been to make character vectors
> and factors almost equivalent (apart from the way that factors always
> remember their original range).  There are a few exceptions:
>
>  * summary.character != summary.factor
>  * table(x, exclude = NULL) != table(factor(x), exclude=NULL) when x
> includes missing values
>
>  * strsplit on a factor
>
> > strsplit(factor(c("a", "a b")), " ")
> Error in strsplit(factor(c("a", "a b")), " ") : non-character argument
>
>  * nchar on a factor:
>
> > nchar(factor(c("abc", "d", "defgh")))
> [1] 1 1 1
>
>  * : with two character strings
>
> > "a":"b"
> Error in "a":"b" : NA/NaN argument
> In addition: Warning messages:
> 1: NAs introduced by coercion
> 2: NAs introduced by coercion
> > factor("a"):factor("b")
> [1] a:b
> Levels: a:b
>
> Regards,
>
> Hadley
>
> --
> http://had.co.nz/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
David M Smith <david at revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at www.revolution-computing.com/events



More information about the R-devel mailing list