[Rd] Subsetting vectors/arrays using factors can be seen as misleading

Laurent Gautier lgautier at gmail.com
Fri Mar 14 11:24:53 CET 2008


Thanks for your answer.

I understand that this is long established, but I would suspect that the
usage of extracting by names was then less common (I can easily
admit that this is pure speculation on my side, I have no data to support this).
As R evolves, sometimes things happen to be deprecated (a recent example
seems to be ''$" on atomic vectors).

I also understand that the behavior is documented, but I have seen it
causing trouble to someone that was not a complete beginner with R,
and I could only agree with the fact that this is somewhat misleading
after I helped solving the problem.

Suggesting a warning and keep the current behavior is just an idea
(that would not break existing code). Similarly, casting factors into
integers was just a way to illustrate the point. Something that would
not involve unnecessary copies of (potentially large) objects can be
considered (may be using "unclass()" does not involve making a copy
and could be used in place of "as.integer" ?).



2008/3/14, Prof Brian Ripley <ripley at stats.ox.ac.uk>:
> This is long established and documented on the basic help page for '['.
>  Further, the convention is widely used in R itself: running 'make check'
>  would give a few hundred warnings and then fail.  Working around those
>  warnings would be inefficient (involving unnecessary copying of large
>  objects).
>
>  One place where this matters is the advice to use levels(x)[x] as in
>  as.character.factor() -- that construction is widespread, perhaps so
>  widespread as to make it worthwhile making that an internal operation.
>
>
>  On Thu, 13 Mar 2008, Laurent Gautier wrote:
>
>  > Dear list,
>  >
>  > Subsetting vectors/arrays using factors can be seen as misleading, and
>  > I was thinking that it could be discouraged (at least by issuing a
>  > warning).
>  > I could not find whether this was discussed earlier, but I can be
>  > pointed to a reference if I missed any.
>  >
>  > The "extract" operator "[" can take as arguments either vectors of
>  > integers or vectors of characters in order to subset a data structure.
>  > For example:
>  >> x <- seq(1, 5)
>  >> names(x) <- letters[1:5]
>  >>
>  >> x[1]
>  > a
>  > 1
>  >> x["a"]
>  > a
>  > 1
>  >
>  > Using a factor caused some confusion to someone here, and I have to
>  > admit that it can indeed appear misleading:
>  >> f <- factor("a", levels=c("b", "a", "c"))
>  >> f
>  > [1] a
>  > Levels: b a c
>  >> x[f]  # here the integer is used, rather than the level
>  > b
>  > 2
>  >
>  > The dual nature of the factor (vector of integers, with an attached
>  > vector of levels), is not always clear to many users, especially since
>  > factors are treated differently in other situations.
>  > Example:
>  >> f == 1
>  > [1] FALSE
>  >> f == "a" #here the level is used, not the integer
>  > [1] TRUE
>  >
>  > This is making me suggest that indexing using a factor could issue a
>  > warning, and the user should explicitly wrap the vector with either
>  > "as.integer" or "as.character".
>  >
>  >
>  > L.
>  >
>  > PS: All examples above were run with
>  > platform       x86_64-unknown-linux-gnu
>  > arch           x86_64
>  > os             linux-gnu
>  > system         x86_64, linux-gnu
>  > status         Under development (unstable)
>  > major          2
>  > minor          7.0
>  > year           2008
>  > month          03
>  > day            12
>  > svn rev        44742
>  > language       R
>  > version.string R version 2.7.0 Under development (unstable) (2008-03-12 r44742)
>  >
>
> > ______________________________________________
>  > R-devel at r-project.org mailing list
>  > https://stat.ethz.ch/mailman/listinfo/r-devel
>  >
>
>
>  --
>  Brian D. Ripley,                  ripley at stats.ox.ac.uk
>  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>  University of Oxford,             Tel:  +44 1865 272861 (self)
>  1 South Parks Road,                     +44 1865 272866 (PA)
>  Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>


-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iEYEARECAAYFAkYgwJ4ACgkQB/w/MLoyRDeQlgCeMp8v69/Wy24Q4IaBVhoG1M5R
2h4AoIOTvKbrFpTklRDjV7u8tEOeSQqt
=JPph
-----END PGP SIGNATURE-----



More information about the R-devel mailing list