[Rd] grep() and factors

Marc Schwartz (via MN) mschwartz at mn.rr.com
Tue Jun 6 17:50:48 CEST 2006


On Tue, 2006-06-06 at 11:12 +0100, Prof Brian Ripley wrote:
> On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
> 
> > Hi all,
> >
> > Based upon an offlist communication this morning, I am somewhat confused
> > (more than I usually am on most Monday mornings...) about the use of
> > grep() with factors as the 'x' argument.
> >
> > The argument guidance in ?grep indicates:
> >
> > x, text a character vector where matches are sought. Coerced to
> >        character if possible.
> >
> > and in the Details section:
> >
> > Arguments which should be character strings or character vectors are
> > coerced to character if possible.
> >
> >
> > The wording of both would seem to reasonably lead to the conclusion that
> > a factor could be coerced to a character vector by the use of
> > as.character(FACTOR).
> 
> Well, that is not what is meant by the wording, nor what happens: there is 
> no method dispatch so the factor is coerced from an integer vector to a 
> character vector.  'coerced' usually means at low level: where 
> as.character() is involved we tend to say so.
> 
> As for the comments on what happens if value=TRUE: if the 'x' has been 
> coerced, I would expect the value to be based on the coerced value (and it 
> currently is).
> 
> > grep("1", factor(letters))
>   [1]  1 10 11 12 13 14 15 16 17 18 19 21
> > grep("1", factor(letters), value=TRUE)
>   [1] "1"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "21"
> 
> So whereas I am quite happy to replace the low-level coercion by method 
> dispatch on as.character, I don't think this should be altered (and am 
> pretty sure there is code out there which expects a character vector 
> result).

Prof. Ripley,

Thanks for your reply and clarification.

I would acknowledge that the coercion of a factor to its numeric values
would not be immediately intuitive to me (or others who have commented
on this) within the context of grep(). However, in light of your
comments and having reviewed the C code, it does make sense.

Given this behavior, it would seem reasonable to provide a clarification
in ?grep, perhaps as follows:

Arguments

x, text a character vector where matches are sought. Coerced to
character if possible. See Details for factors.


Details

Arguments which should be character strings or character vectors are
coerced to character if possible. In the case of factors, these are
coerced using as.integer(x). You must explicitly coerce the factor using
as.character(x) to use these functions on the character vector
equivalent.


Thanks for your consideration.

Regards,

Marc Schwartz



More information about the R-devel mailing list