[R] gsub syntax

John Logsdon j.logsdon at quantex-research.com
Sun Nov 27 11:04:41 CET 2005


Hello

I know that R's string functions are not as extensive as those of Unix but
I need to do some text handling totally within an R environment because
the target is a Windows system which will not have the corresponding shell
utilities, sed, awk etc.

Can anyone explain the following gsub phenomenon to me:

> dates<-c("73","74","02","1973","1974","2002")

I want to take just the last two digits where it is a 4-digit year and
both digits when it is a 2-digit year.  I should be able to use substr but
measurement from the string end (with a negative counter or something) is
not implemented:

> substr(dates,3,4)
[1] ""   ""   ""   "73" "74" "02"
> substr(dates,-2,4)
[1] "73"   "74"   "02"   "1973" "1974" "2002"
> substr(dates,4,-2)
[1] "" "" "" "" "" ""

So I tried gsub:

> gsub("[19|20]([0-9][0-9])","\\1",dates)
[1] "73"  "74"  "02"  "973" "974" "002"

As I understand it (and comparing with sed), the \\1 should take the first
bracketed string but clearly this doesn't work.  If I try what should also
work:

> gsub("[19|20]([0-9])([0-9])","\\1\\2",dates)
[1] "73"  "74"  "02"  "973" "974" "002"

On the other hand the following does work:

> gsub("[19|20]([0-9])([0-9])","\\2",dates) 
[1] "73" "74" "02" "73" "74" "02"

So it appears that the substitution takes one character extra to the left
but the following indicates that the lower limit of the selected range is
also at fault:

> s<-c("1","12","123","1234","12345","123456")
> gsub("[12]([4-6]*)","",s)
[1] ""     ""     "3"    "34"   "345"  "3456"

Probably more elegant examples could be constructed that could home in on
the issue.

The version is R 2.0.1 on Linux so perhaps it is a little old now.

Questions:

1) Am I misunderstanding the gsub use?

2) Was it a bug that has since been corrected?

3) Is it still a bug in the latest version?

TIA

JOhn

John Logsdon                               "Try to make things as simple
Quantex Research Ltd, Manchester UK         as possible but not simpler"
j.logsdon at quantex-research.com              a.einstein at relativity.org
+44(0)161 445 4951/G:+44(0)7717758675       www.quantex-research.com




More information about the R-help mailing list