[Rd] Subsetting vectors/arrays using factors can be seen as misleading

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Mar 14 10:55:59 CET 2008


This is long established and documented on the basic help page for '['.
Further, the convention is widely used in R itself: running 'make check' 
would give a few hundred warnings and then fail.  Working around those 
warnings would be inefficient (involving unnecessary copying of large 
objects).

One place where this matters is the advice to use levels(x)[x] as in 
as.character.factor() -- that construction is widespread, perhaps so 
widespread as to make it worthwhile making that an internal operation.

On Thu, 13 Mar 2008, Laurent Gautier wrote:

> Dear list,
>
> Subsetting vectors/arrays using factors can be seen as misleading, and
> I was thinking that it could be discouraged (at least by issuing a
> warning).
> I could not find whether this was discussed earlier, but I can be
> pointed to a reference if I missed any.
>
> The "extract" operator "[" can take as arguments either vectors of
> integers or vectors of characters in order to subset a data structure.
> For example:
>> x <- seq(1, 5)
>> names(x) <- letters[1:5]
>>
>> x[1]
> a
> 1
>> x["a"]
> a
> 1
>
> Using a factor caused some confusion to someone here, and I have to
> admit that it can indeed appear misleading:
>> f <- factor("a", levels=c("b", "a", "c"))
>> f
> [1] a
> Levels: b a c
>> x[f]  # here the integer is used, rather than the level
> b
> 2
>
> The dual nature of the factor (vector of integers, with an attached
> vector of levels), is not always clear to many users, especially since
> factors are treated differently in other situations.
> Example:
>> f == 1
> [1] FALSE
>> f == "a" #here the level is used, not the integer
> [1] TRUE
>
> This is making me suggest that indexing using a factor could issue a
> warning, and the user should explicitly wrap the vector with either
> "as.integer" or "as.character".
>
>
> L.
>
> PS: All examples above were run with
> platform       x86_64-unknown-linux-gnu
> arch           x86_64
> os             linux-gnu
> system         x86_64, linux-gnu
> status         Under development (unstable)
> major          2
> minor          7.0
> year           2008
> month          03
> day            12
> svn rev        44742
> language       R
> version.string R version 2.7.0 Under development (unstable) (2008-03-12 r44742)
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list