[Rd] grep() and factors

Bill Dunlap bill at insightful.com
Tue Jun 6 02:57:59 CEST 2006


On Mon, 5 Jun 2006, Marc Schwartz (via MN) wrote:

> > > > grep("[a-z]", factor(letters))
> > > numeric(0)
> >
> > I was recently surprised by this also.  In addition, if
> > R's grep did support factors in this way, what sort of
> > object (factor or character) should it return when value=T?
> > I recently changed Splus's grep to return a character vector in
> > that case.
> >
> >    Splus> grep("[def]", letters[26:1])
> >    [1] 21 22 23
> >    Splus>  grep("[def]", factor(letters[26:1], levels=letters[26:1]))
> >    [1] 21 22 23
> >    Splus> grep("[def]", letters[26:1], value=T)
> >    [1] "f" "e" "d"
> >    Splus> grep("[def]", factor(letters[26:1], levels=letters[26:1]), value=T)
> >    [1] "f" "e" "d"
> >    Splus> class(.Last.value)
> >    [1] "character"
> >
> > R does this when grepping an integer vector.
> >    R> grep("1", 0:11, value=T)
> >    [1] "1"  "10" "11"
> > help(grep) says it returns "the matching elements themselves", but
> > doesn't say if "themselves" means before or after the conversion to
> > character.
>
> Bill,
>
> My first inclination for the return value when used on a factor would be
> the indexed factor elements where grep() would otherwise simply return
> the indices. This would also maintain the factor levels from the
> original source factor since "[".factor would normally retain these when
> drop = FALSE.

That would be my first inclination also.  I would have expected the output of
   grep(pattern, text, value=TRUE)
to be identical to that of
   text[grep(pattern, text, value=FALSE)]
no matter what class text has.

No end users have seen this in Splus so we can change it to anything,
but we want to keep it the same as R's.

> I could be convinced either way. The concern of course being that (given
> the offlist replies I have received today) even experienced users are
> getting bitten by the current behavior versus their intuitive
> expectations, which are at least loosely supported by the documentation.
>
> HTH,
>
> Marc Schwartz

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."



More information about the R-devel mailing list