[R] Counting enumerated items in each element of a character vector

Wed Apr 26 05:40:34 CEST 2017

I like Boris's "Hadley" solution.  For the record, I've appended a
version that uses regular expressions, the only benefit of which is
that it could be generalized to find more-complicated patterns.

-- Mike

counts <- sapply(text1, function(next_string) {
    loc_example <- length(gregexpr("Example", next_string)[[1]])
    loc_example
}, USE.NAMES=FALSE)

> counts
[1] 5 5 5 5
>

On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:
> I should add: there's a str_count() function in the stringr package.
>
> library(stringr)
> str_count(text1, "Example")
> # [1] 5 5 5 5
>
> I guess that would be the neater solution.
>
> B.
>
>
>
>> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:
>>
>> How about:
>>
>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))
>>
>>
>> Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of
>> matches. You can use any regex instead of "example" if you need to tweak what you are looking for.
>>
>>
>> B.
>>
>>
>>
>>
>>> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> I am looking for a streamlined way of counting the number of enumerated
>>> items are each element of a character vector. For example:
>>>
>>>
>>> text1<-c("This is an example.
>>> List 1
>>> 1) Example 1
>>> 2) Example 2
>>> 10) Example 10
>>> List 2
>>> 1) Example 1
>>> 2) Example 2
>>> These have been examples.","This is another example.
>>> List 1
>>> 1. Example 1
>>> 2. Example 2
>>> 10. Example 10
>>> List 2
>>> 1. Example 1
>>> 2. Example 2
>>> These have been examples.","This is a third example. List 1 1) Example 1.
>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
>>> been examples."
>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
>>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>>>
>>> text1
>>>
>>> ===
>>>
>>> I would like the result to be c(5,5,5,5). Notice that sometimes there are
>>> leading hard returns, other times not. Sometimes are there separate lists
>>> and the same numbers are used in the enumerated items multiple times within
>>> each character string. Sometimes the leading numbers for the enumerated
>>> items exceed single digits. Notice that the delimiter may be ) or a period
>>> (.). If the delimiter is a period and there are hard returns (example 2),
>>> then I expect that will be easy enough to differentiate sentences ending
>>> with a number from enumerated items. However, I imagine it would be much
>>> more difficult to differentiate the two for example 4.
>>>
>>> Any suggestions are appreciated.
>>>
>>> Best,
>>>
>>> Dan
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.