[R] regular expressions

baptiste auguie baptiste.auguie at googlemail.com
Mon Oct 26 15:25:23 CET 2009


Perfect, thanks!

baptiste

2009/10/26 Gabor Grothendieck <ggrothendieck at gmail.com>:
> Assuming only START fields match pat:
>
>> ## this one has more fields: how do I generalize the regular expression?
>> st2 = c("START text1 1 text2 2.3 text3 5", "whatever intermediate text",
> + "START text1 23.4 text2 3.1415 text3 6")
>>
>> pat <- "[[:alnum:]]+ +([0-9.]+)"
>> s <- strapply(st2, pat, c, simplify = rbind)
>>
>> pat2 <- "([[:alnum:]]+) +[0-9.]+"
>> colnames(s) <- strapply(st2[1], pat2, c, simplify = rbind)
>> s
>     text1  text2    text3
> [1,] "1"    "2.3"    "5"
> [2,] "23.4" "3.1415" "6"
>
> If there are non-START fields that do match pat then grep out the
> START fields first.
>
> On Mon, Oct 26, 2009 at 9:30 AM, baptiste auguie
> <baptiste.auguie at googlemail.com> wrote:
>> Dear list,
>>
>> I have the following text to parse (originating from readLines as some
>> lines have unequal size),
>>
>> st = c("START text1 1 text2 2.3", "whatever intermediate text", "START
>> text1 23.4 text2 3.1415")
>>
>> from which I'd like to extract the lines starting with "START", and
>> group the subsequent fields in a data.frame in this format:
>>
>>  text1  text2
>>     1    2.3
>>  23.4 3.1415
>>
>>
>> All the lines containing "START" have the same number of fields, but
>> this number may vary from file to file.
>>
>> I have managed to get this minimal example work, but I am at a loss as
>> for handling an arbitrary number of couples (text value),
>>
>> library(gsubfn)
>>
>> ( parsed =
>> strapply(st, "^START +([[:alnum:]]+) +([0-9.]+) +([[:alnum:]]+)
>> +([0-9.]+)",c, simplify=rbind,combine=c) )
>>
>> d = data.frame(parsed[ ,c(2,4)])
>> names(d) <- apply(parsed[ ,c(1,3)], 2, unique)
>> d
>>
>> ## this one has more fields: how do I generalize the regular expression?
>> st2 = c("START text1 1 text2 2.3 text3 5", "whatever intermediate
>> text", "START text1 23.4 text2 3.1415 text3 6")
>>
>> Best regards,
>>
>>
>> Baptiste
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>




More information about the R-help mailing list