[R] Regular Expressions

Tony Plate tplate at blackmesacapital.com
Tue Jul 13 02:11:57 CEST 2004


I'd suggest doing it with multiple regular expressions -- you could 
construct a single regular expression for this, but I expect it would get 
quite complicated and possibly very slow.

The expression for "y" in the example below tabulates how many words 
matched for each line (i.e., line 2 matched 1 word, line 3 matched 3 words, 
and line 4 matched 2 words).


 > x <- readLines("clipboard", -1)
 > x
[1] "Is there a way to use regular expressions to capture two or more words 
in a "
[2] "sentence?  For example, I wish to to find all the lines that have the 
words \"thomas\", "
[3] "\"perl\", and \"program\", such as \"thomas uses a program called 
perl\", or \"perl is a "
[4] "program that thomas uses\", 
etc."
 > sapply(c("perl","program","thomas"), function(re) grep(re, x))
$perl
[1] 3

$program
[1] 3 4

$thomas
[1] 2 3 4

 > unlist(sapply(c("perl","program","thomas"), function(re) grep(re, x)), 
use.names=F)
[1] 3 3 4 2 3 4
 > y <- table(unlist(sapply(c("perl","program","thomas"), function(re) 
grep(re, x)), use.names=F))
 > y

2 3 4
1 3 2
 > which(y>=2)
3 4
2 3
 >

hope this helps,

Tony Plate

At Monday 05:59 PM 7/12/2004, Sangick Jeon wrote:


>Hi,
>
>Is there a way to use regular expressions to capture two or more words in a
>sentence?  For example, I wish to to find all the lines that have the 
>words "thomas",
>"perl", and "program", such as "thomas uses a program called perl", or 
>"perl is a
>program that thomas uses", etc.
>
>I'm sure this is a very easy task, I would greatly appreciate any 
>help.  Thanks!
>
>Sangick
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list