[R] regexp help needed

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Fri Nov 28 11:51:48 CET 2008


Lauri Nikkinen wrote:
> Hello,
> 
> I have a vector of dates and I would like to grep the year component
> from this vector (= all digits
> after the last punctuation character)
> 
> dates <- c("28.7.08","28.7.2008","28/7/08", "28/7/2008", "28/07/2008",
> "28-07-2008", "28-07-08")
> 
> the resulting vector should look like
> 
> "08" "2008" "08" "2008" "2008" "2008" "08"
> 
> I tried something like (Perl style) with no success
> 
> grep("[[:punct:]]?\\d", dates, value=T, perl=T)
> 
> Any ideas?

> sub(".*[[:punct:]]([0-9]*$)", "\\1", dates)
[1] "08"   "2008" "08"   "2008" "2008" "2008" "08"
> sub(".*[[:punct:]](.*)$", "\\1", dates)
[1] "08"   "2008" "08"   "2008" "2008" "2008" "08"
> sub(".*[[:punct:]]", "", dates)
[1] "08"   "2008" "08"   "2008" "2008" "2008" "08"
> substring(dates,regexpr("[0-9]*$", dates))
[1] "08"   "2008" "08"   "2008" "2008" "2008" "08"

(grep() won't do. It only tells you _whether_ the pattern matches.)


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list