[R] Matching a pattern of vector of character strings in another vector of character strings

Gabor Grothendieck ggrothendieck at gmail.com
Fri Dec 17 16:24:04 CET 2010


On Fri, Dec 17, 2010 at 9:10 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
>
> On Dec 17, 2010, at 7:58 AM, Liviu Andronic wrote:
>
>> On Fri, Dec 17, 2010 at 2:34 PM, Jing Liu <quiet_jing0920 at hotmail.com> wrote:
>>>> M<- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3)
>>>> colnames(M)<- c("2006","2007","2008","2009","2010")
>>>> M
>>>     2006 2007 2008 2009 2010
>>> [1,] "0"  "1"  "1"  "*"  "0"
>>> [2,] "0"  "0"  "0"  "1"  "1"
>>> [3,] "1"  "1"  "0"  "1"  "*"
>>>
>>>> pattern<- c("0","1")
>>>
>>> I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year?
>>> E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3.
>>>
>> I could only think of this
>>> apply(M, 1, function(z) grep('01', paste(z, collapse='')))
>> [1] 1 1 1
>>> apply(M, 1, function(z) grepl('01', paste(z, collapse='')))
>> [1] TRUE TRUE TRUE
>>
>> But it doesn't return the position of the matched string. So this
>> isn't what you wanted.
>>
>> Regards
>> Liviu
>>
>>
>>> For as far as I know, the variations of the grep function group cannot search for a pattern that has 2 or more character strings. I could do it with a loop but I seek a more efficient way than a loop. How should I do it? Really appreciated for your help!!!
>>>
>>> Best regards,
>>> Jing Liu
>
>
> Try this:
>
>> colnames(M)[regexpr("01", apply(M, 1, paste, collapse = ""))]
> [1] "2006" "2008" "2008"
>
>
> See ?regexpr for more info.
>

Here is a slight variation which would only be needed if its possible
that a row can have no 01 at all:

ix <- regexpr("01", apply(M, 1, paste, collapse = ""))
colnames(M)[ ifelse(ix > 0, ix, NA_integer_) ]

Note that we must use NA_integer_ and not NA if we want it to not only
work in the case where some rows have no 01 but also work in the case
that there are no 01's in any row at all.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list