[Rd] Subsetting vectors/arrays using factors can be seen as misleading

Laurent Gautier lgautier at gmail.com
Wed Mar 12 22:06:38 CET 2008


Dear list,

Subsetting vectors/arrays using factors can be seen as misleading, and
I was thinking that it could be discouraged (at least by issuing a
warning).
I could not find whether this was discussed earlier, but I can be
pointed to a reference if I missed any.

The "extract" operator "[" can take as arguments either vectors of
integers or vectors of characters in order to subset a data structure.
For example:
> x <- seq(1, 5)
> names(x) <- letters[1:5]
>
> x[1]
a
1
> x["a"]
a
1

Using a factor caused some confusion to someone here, and I have to
admit that it can indeed appear misleading:
> f <- factor("a", levels=c("b", "a", "c"))
> f
[1] a
Levels: b a c
> x[f]  # here the integer is used, rather than the level
b
2

The dual nature of the factor (vector of integers, with an attached
vector of levels), is not always clear to many users, especially since
factors are treated differently in other situations.
Example:
> f == 1
[1] FALSE
> f == "a" #here the level is used, not the integer
[1] TRUE

This is making me suggest that indexing using a factor could issue a
warning, and the user should explicitly wrap the vector with either
"as.integer" or "as.character".


L.

PS: All examples above were run with
platform       x86_64-unknown-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status         Under development (unstable)
major          2
minor          7.0
year           2008
month          03
day            12
svn rev        44742
language       R
version.string R version 2.7.0 Under development (unstable) (2008-03-12 r44742)



More information about the R-devel mailing list