[R] Frequency of a character in a string

Charles C. Berry ccberry at ucsd.edu
Mon Nov 14 18:26:13 CET 2016


On Mon, 14 Nov 2016, Bert Gunter wrote:

> Yes, but it need some help, since nchar gives the length of the
> *entire* string; e.g.
>
> ## to count "a" 's  :
>
>> x <-(c("abbababba","bbabbabbaaaba"))
>> nchar(gsub("[^a]","",x))
> [1] 4 6
>
> This is one of about 8 zillion ways to do this in base R if you don't
> want to use a specialized package.
>
> Just for curiosity: Can anyone comment on what is the most efficient
> way to do this using base R pattern matching?
>

Most efficient? There probably is no uniformly most efficient way to do 
this as the timing will depend on the distribution of "a" in the atoms of 
any vector as well as the length of the vector.

But here is one way to avoid the regular expression matching:

lengths(strsplit(paste0("X", x, "X"),"a",fixed=TRUE)) - 1


Chuck



More information about the R-help mailing list