[R] Counting defined character within String

Marc Schwartz marc_schwartz at me.com
Mon Jul 5 16:18:19 CEST 2010


On Jul 5, 2010, at 9:04 AM, Kunzler, Andreas wrote:

> Dear list,
> 
> I'm looking for a way to count the number of "|" within an object.
> The character "|" is used to separated ids.
> 
> Assume a data (d) structure like
> 
> Var
> NA
> NA
> NA
> NA
> NA
> 1
> 1|2
> 1|22|45
> 3
> 4b|24789
> 
> I need to know the maximum number of ids within one object. In this case 3 (1|22|45)
> 
> 
> Does anybody know a better way?
> 
> Thanks


Presuming that your column is in a data frame called 'DF', where the 'Var' column is likely imported as a factor:

> DF
        Var
1      <NA>
2      <NA>
3      <NA>
4      <NA>
5      <NA>
6         1
7       1|2
8   1|22|45
9         3
10 4b|24789



> max(sapply(strsplit(as.character(DF$Var), split = "\\|"), length))
[1] 3


The above uses strsplit() to split each line using the "|" as the split character. Since "|" has a special meaning for regular expressions, it needs to be escaped using the double backslash:

> strsplit(as.character(DF$Var), split = "\\|")
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

[[5]]
[1] NA

[[6]]
[1] "1"

[[7]]
[1] "1" "2"

[[8]]
[1] "1"  "22" "45"

[[9]]
[1] "3"

[[10]]
[1] "4b"    "24789"


Then you just loop through each line getting the length:

> sapply(strsplit(as.character(DF$Var), split = "\\|"), length)
 [1] 1 1 1 1 1 1 2 3 1 2


and of course get the max value.

HTH,

Marc Schwartz



More information about the R-help mailing list