[R] grep searching for sequence of 3 consecutive upper case letters

Peter Dalgaard p.dalgaard at biostat.ku.dk
Mon Nov 6 23:51:35 CET 2006


"Lapointe, Pierre" <Pierre.Lapointe at nbf.ca> writes:

> Hello,
> 
> I need to identify all elements which have a sequence of 3 consecutive upper
> case letters, anywhere in the string.
> 
> I tested my grep expression on this site: http://regexlib.com/RETester.aspx
> 
> But when I try it in R, it does not filter anything.
> 
> str <-c("AGH", "this WOUld be good", "Not Good at All")
> str[grep('[A-Z]{3}',str)] #looking for a sequence of 3 consecutive upper
> case letters
> 
> [1] "AGH"                "this WOUld be good" "Not Good at All"   
> 
> Any idea?

There are multiple versions of RE's, and fine details resolve in
different ways. Don't expect the RETester to hold the Final Truth; it
seems to relate to a particular programming environment, which is not
R.  

> grep('[A-Z]{3}', str, perl=TRUE)
[1] 1 2

Not only that, but

> grep('[ABCDEFGHIJKLMNOPQRSTUVWXYZ]{3}', str)
[1] 1 2

Hint: What is your collating sequence?

> Sys.setlocale("LC_COLLATE", "C")
[1] "C"
> grep('[A-Z]{3}', str)
[1] 1 2


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list