[R] Using regular expressions to detect clusters of consonants in a string

Mark Heckmann mark.heckmann at gmx.de
Tue Jun 30 17:30:28 CEST 2009


Hi,

I want to parse a string extracting the number of occurrences where two
consonants clump together. Consider for example the word "hallo". Here I
want the algorithm to return 1. For "chess" if want it to return 2. For the
word "screw" the result should be negative as it is a clump of three
consonants not two. Also for word "abstraction" I do not want the algorithm
to detect two times a two consonant cluster. In this case the result should
be negative as well as it is four consonants in a row.

str <- "hallo"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]

[1] 3
attr(,"match.length")
[1] 3

The result is correct. Now I change the word to "hall"

str <- "hall"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]

[1] -1
attr(,"match.length")
[1] -1

Here my expression fails. How can I write a correct regex to do this? I
always encounter problems at the beginning or end of a string.

Also:

str <- "abstraction"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]

[1] 4 7
attr(,"match.length")
[1] 3 3

This also fails.

Thanks in advance,
Mark

-------------------------------
Mark Heckmann
www.markheckmann.de
R-Blog: http://ryouready.wordpress.com




More information about the R-help mailing list