[R] gregexpr() - length of the matched text to a vector

Seth Falcon sfalcon at fhcrc.org
Wed Jan 11 16:04:33 CET 2006


Hi Petri,

On 11 Jan 2006, petri.palmu at geneos.fi wrote:
> I'm using gregexpr(). As a result something like this:
>
> # starting positions of the match:
> [[1]]
> [1] 7 18
>
> # length of the matched text:
> attr(,"match.length")
> [1] 4 4
>
> Now, I'd like to have a matrix,
> 7    4
> 18   4
>
> but I don't know how to handle the attr(,"match.length") ...?
> The format of the output is pretty unclear to me in that respect.

Brief description of the format: a list.  Each element of the list
is a result that corresponds to a string element in the input
character vector.  Each element consists of an integer vector of
starting positions for a match.  The integer vector has a match.length
atttribute consisting of an integer vector of match lengths.

Whew.  Would a matrix be better?  Probably.

To get a list of matrices you can do:

> txt
[1] "foobarfoobazfoofoo" "foo"                "bar"               
[4] "foofoofoo"         
> lapply(gregexpr("foo", txt), function(x) cbind(x, attr(x, "match.length")))
[[1]]
      x  
[1,]  1 3
[2,]  7 3
[3,] 13 3
[4,] 16 3

[[2]]
     x  
[1,] 1 3

[[3]]
      x   
[1,] -1 -1

[[4]]
     x  
[1,] 1 3
[2,] 4 3
[3,] 7 3


HTH,

+ seth




More information about the R-help mailing list