[Rd] grep() and factors

Marc Schwartz (via MN) mschwartz at mn.rr.com
Tue Jun 6 18:21:21 CEST 2006


On Tue, 2006-06-06 at 17:08 +0100, Prof Brian Ripley wrote:
> On Tue, 6 Jun 2006, Marc Schwartz (via MN) wrote:
> 
> > On Tue, 2006-06-06 at 11:12 +0100, Prof Brian Ripley wrote:
> >> On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:
> >>
> >>> Hi all,
> >>>
> >>> Based upon an offlist communication this morning, I am somewhat confused
> >>> (more than I usually am on most Monday mornings...) about the use of
> >>> grep() with factors as the 'x' argument.
> >>>
> >>> The argument guidance in ?grep indicates:
> >>>
> >>> x, text a character vector where matches are sought. Coerced to
> >>>        character if possible.
> >>>
> >>> and in the Details section:
> >>>
> >>> Arguments which should be character strings or character vectors are
> >>> coerced to character if possible.
> >>>
> >>>
> >>> The wording of both would seem to reasonably lead to the conclusion that
> >>> a factor could be coerced to a character vector by the use of
> >>> as.character(FACTOR).
> >>
> >> Well, that is not what is meant by the wording, nor what happens: there is
> >> no method dispatch so the factor is coerced from an integer vector to a
> >> character vector.  'coerced' usually means at low level: where
> >> as.character() is involved we tend to say so.
> >>
> >> As for the comments on what happens if value=TRUE: if the 'x' has been
> >> coerced, I would expect the value to be based on the coerced value (and it
> >> currently is).
> >>
> >>> grep("1", factor(letters))
> >>   [1]  1 10 11 12 13 14 15 16 17 18 19 21
> >>> grep("1", factor(letters), value=TRUE)
> >>   [1] "1"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "21"
> >>
> >> So whereas I am quite happy to replace the low-level coercion by method
> >> dispatch on as.character, I don't think this should be altered (and am
> >> pretty sure there is code out there which expects a character vector
> >> result).
> >
> > Prof. Ripley,
> >
> > Thanks for your reply and clarification.
> >
> > I would acknowledge that the coercion of a factor to its numeric values
> > would not be immediately intuitive to me (or others who have commented
> > on this) within the context of grep(). However, in light of your
> > comments and having reviewed the C code, it does make sense.
> >
> > Given this behavior, it would seem reasonable to provide a clarification
> > in ?grep, perhaps as follows:
> >
> > Arguments
> >
> > x, text a character vector where matches are sought. Coerced to
> > character if possible. See Details for factors.
> >
> >
> > Details
> >
> > Arguments which should be character strings or character vectors are
> > coerced to character if possible. In the case of factors, these are
> > coerced using as.integer(x). You must explicitly coerce the factor using
> > as.character(x) to use these functions on the character vector
> > equivalent.
> 
> I do think we should `replace the low-level coercion by method dispatch on 
> as.character', and have done so in R-devel (but am still testing 
> packages).  There have been quite a few instances of such low-level 
> coercion (including for dimnames), and I am currently looking through to 
> see if there are any others that either should be altered or the 
> documentation clarified.

Prof. Ripley,

I did not want to presume that you would indeed do this or more, had
already done so. Though given your additional comments, I now note that
this is mentioned in the NEWS file for R-devel.

I do sincerely appreciate your efforts here.

Perhaps an interim change in ?grep as above for 2.3.1patched might be
considered, though now with an additional comment that this approach
will (might) change in 2.4.0?

I have added Bill Dunlap as a cc: here, given his expressed desire to be
consistent with R on this point.

Regards,

Marc



More information about the R-devel mailing list