[R] Using regular expressions to detect clusters of consonants in a string

Wed Jul 1 13:57:38 CEST 2009

strapply and gsubfn pass the ... argument to gsub so it accepts
all the same arguments.  See ?strappy and ?gsubfn.  e.g.

> strapply("MyString", "[bcdfghjklmnpqrstvwxyz]+", nchar, ignore.case = TRUE)
[[1]]
[1] 5 2

> gsubfn("[bcdfghjklmnpqrstvwxyz]+", "X", "MyString", ignore.case = TRUE)
[1] "XiX"

On Wed, Jul 1, 2009 at 5:07 AM, Mark Heckmann<mark.heckmann at gmx.de> wrote:
>
> Hi Gabor,
>
> thanks fort his great advice. Just one more question:
> I cannot find how to switch off case sensitivity for the regex in the
> documentation for gsubfn or strapply, like e.g. in gregexpr the ignore.case
> =TRUE command.  Is there a way?
>
> TIA,
> Mark
>
> -------------------------------
>
> Mark Heckmann
> + 49 (0) 421 - 1614618
> www.markheckmann.de
> R-Blog: http://ryouready.wordpress.com
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Gesendet: Dienstag, 30. Juni 2009 18:31
> An: Mark Heckmann
> Cc: r-help at r-project.org
> Betreff: Re: [R] Using regular expressions to detect clusters of consonants
> in a string
>
> Try this:
>
> library(gsubfn)
> s <- "mystring"
> strapply(s, "[bcdfghjklmnpqrstvwxyz]+", nchar)[[1]]
>
> which returns a vector of consonant string lengths.
> Now apply your algorithm to that.
> See http://gsubfn.googlecode.com for more.
>
> On Tue, Jun 30, 2009 at 11:30 AM, Mark Heckmann<mark.heckmann at gmx.de> wrote:
>> Hi,
>>
>> I want to parse a string extracting the number of occurrences where two
>> consonants clump together. Consider for example the word "hallo". Here I
>> want the algorithm to return 1. For "chess" if want it to return 2. For
> the
>> word "screw" the result should be negative as it is a clump of three
>> consonants not two. Also for word "abstraction" I do not want the
> algorithm
>> to detect two times a two consonant cluster. In this case the result
> should
>> be negative as well as it is four consonants in a row.
>>
>> str <- "hallo"
>> gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
>> extended = TRUE)[[1]]
>>
>> [1] 3
>> attr(,"match.length")
>> [1] 3
>>
>> The result is correct. Now I change the word to "hall"
>>
>> str <- "hall"
>> gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
>> extended = TRUE)[[1]]
>>
>> [1] -1
>> attr(,"match.length")
>> [1] -1
>>
>> Here my expression fails. How can I write a correct regex to do this? I
>> always encounter problems at the beginning or end of a string.
>>
>> Also:
>>
>> str <- "abstraction"
>> gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
>> extended = TRUE)[[1]]
>>
>> [1] 4 7
>> attr(,"match.length")
>> [1] 3 3
>>
>> This also fails.
>>
>> Thanks in advance,
>> Mark
>>
>> -------------------------------
>> Mark Heckmann
>> www.markheckmann.de
>> R-Blog: http://ryouready.wordpress.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>