[Rd] grep() and factors

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jun 6 12:12:56 CEST 2006


On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:

> Hi all,
>
> Based upon an offlist communication this morning, I am somewhat confused
> (more than I usually am on most Monday mornings...) about the use of
> grep() with factors as the 'x' argument.
>
> The argument guidance in ?grep indicates:
>
> x, text a character vector where matches are sought. Coerced to
>        character if possible.
>
> and in the Details section:
>
> Arguments which should be character strings or character vectors are
> coerced to character if possible.
>
>
> The wording of both would seem to reasonably lead to the conclusion that
> a factor could be coerced to a character vector by the use of
> as.character(FACTOR).

Well, that is not what is meant by the wording, nor what happens: there is 
no method dispatch so the factor is coerced from an integer vector to a 
character vector.  'coerced' usually means at low level: where 
as.character() is involved we tend to say so.

As for the comments on what happens if value=TRUE: if the 'x' has been 
coerced, I would expect the value to be based on the coerced value (and it 
currently is).

> grep("1", factor(letters))
  [1]  1 10 11 12 13 14 15 16 17 18 19 21
> grep("1", factor(letters), value=TRUE)
  [1] "1"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "21"

So whereas I am quite happy to replace the low-level coercion by method 
dispatch on as.character, I don't think this should be altered (and am 
pretty sure there is code out there which expects a character vector 
result).

> In tracing through the C code in character.c for do_grep(), which in
> turn calls coerceVector() in coerce.c, unless I am mis-reading the code
> (always possible), I don't see an indication that a factor would be
> coerced to a character vector.
>
> Since a factor -> character coercion would seem at face value, the most
> logical coercion to take place when using grep(), I am curious if I am
> missing something, or if perhaps ?grep needs to be more clear in the
> coercions that will or might take place. Perhaps even the consideration
> of an error message if a factor is passed as the 'x' argument, if indeed
> the coercion would not take place.
>
> Perhaps the easiest example here might be:
>
> # On R Version 2.3.1 (2006-06-01) on FC5
>
>> grep("[a-z]", letters)
> [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
> [23] 23 24 25 26
>
>> grep("[a-z]", factor(letters))
> numeric(0)
>
>
> Thanks for any comments or any virtual rotten tomatoes coming my way at
> high speed.  :-)
>
> Marc Schwartz
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list