[Rd] grep with fixed=TRUE and ignore.case=TRUE

Petr Savicky savicky at cs.cas.cz
Fri May 11 17:33:37 CEST 2007


On Wed, May 09, 2007 at 06:41:23AM +0100, Prof Brian Ripley wrote:
> I suggest you collaborate with the person who replied that he thought this 
> was a good idea to supply patches against the R-devel sources for 
> scrutiny.

A possible solution is to use strncasecmp instead of strncmp
in function fgrep_one in R-devel/src/main/character.c.

Corresponding modification of character.c is at
  http://www.cs.cas.cz/~savicky/ignore_case/character.c
and diff file w.r.t. the original character.c (downloaded today) is at
  http://www.cs.cas.cz/~savicky/ignore_case/diff.txt

This seems to work in my installation of R-devel:

  > x <- c("D.G cat", "d.g cat", "dog cat")
  > z <- "d.g"
  > grep(z, x, ignore.case = F, fixed = T)
  [1] 2
  > grep(z, x, ignore.case = T, fixed = T)  # this is the new behavior
  [1] 1 2
  > grep(z, x, ignore.case = T, fixed = F)
  [1] 1 2 3
  >

Since fgrep_one is used many times in character.c, adding igcase_opt as
an additional argument would imply extensive changes to the file.
So, I introduced a new function fgrep_one_igcase called only once in
the file. Another solution is possible.

I do not understand well handling multibyte chars, so I did not test
the function with real multibyte chars, although the code for
this option is used.

Ignore case option is not meaningfull in gsub. It could be meaningful
in regexpr, however, this function does not allow ignore.case option,
so I did no changes to it.

All the best, Petr.



More information about the R-devel mailing list