[R] help reading a variably formatted text file

Michael Na Li lina at u.washington.edu
Tue Nov 19 23:57:06 CET 2002


On Tue, 19 Nov 2002, ripley at stats.ox.ac.uk verbalised:

>  On Tue, 19 Nov 2002, Michael Na Li wrote:
>  
> > It would be nice to have more powerful regex in R, such as returning
> > matched substring grouped with "()".
>  
>  I think you are overlooking the power of gsub.  You can certainly do that.

I want something like:

> REGEXFUN ("abc ([0-9]+)", "abc 30 and ABC 40 and abc 80")
[[1]]
[1] "30" "80"

I'm not sure how to achieve this with 'gsub'.

The best I can come up with is:

regex.match <- function (pattern, x) {
    a <- strsplit (gsub(pattern, "*| \\1 |*", x), split = "\\*")
    b <- lapply (a, function (x) x[grep ("^\\|.*\\|", x)])
    lapply (b, function (x) {
        temp <- unlist (strsplit (x, split = " *\\| *"))
        temp[temp != ""]
    })
}

> regex.match ("abc ([0-9]+)", "abc 30 and ABC 40 and abc 80")
[[1]]
[1] "30" "80"

It is unfortunately not quite useful and breaks down when there are two "()"
expressions or none, for instance.

> regex.match ("abc ([0-9]+) and ABC ([0-9+])", "abc 30 and ABC 40 and abc 80")
[[1]]
[1] "30"

Michael

-- 
----------------------------------------------------------------------------
Michael Na Li                               
Email: lina at u.washington.edu
Department of Biostatistics, Box 357232
University of Washington, Seattle, WA 98195  
---------------------------------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list