[R] Fast way to finding index in Vector

jim holtman jholtman at gmail.com
Tue Jan 13 04:27:47 CET 2009


Is this fast enough for you; matches of 2000 against 2M tags takes 0.2 seconds:

> str(x)
 chr [1:2000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" "DADAA"
"ABCAD" ...
> str(z)
 chr [1:2000000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC"
"DADAA" "ABCAD" ...
> system.time(y <- match(x,z))
   user  system elapsed
    0.2     0.0     0.2
> str(y)
 int [1:2000] 1 2 3 4 5 6 7 8 9 10 ...
>



On Mon, Jan 12, 2009 at 10:17 PM, Gundala Viswanath <gundalav at gmail.com> wrote:
> Yes Jim, exactly.
>
> BTW, I found from ?match
>
> " Matching for lists is potentially very slow and best avoided
>     except in simple cases."
>
> Since I am doing this for million of tags. Is there a faster alternatives?
>
>
> - Gundala Viswanath
> Jakarta - Indonesia
>
>
>
> On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholtman at gmail.com> wrote:
>> Is this what you want:
>>
>>> repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT")
>>> qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT")
>>> match(qr, repo)
>> [1] 3 6 6 3 6 6 2 6 6
>>>
>>
>>
>>
>> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at gmail.com> wrote:
>>> Hi Jorge and all,
>>>
>>> How can I modified your code when
>>>
>>> query size can be bigger than repository,
>>> meaning that it can contain repeats.
>>>
>>> e.g. qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT",  )
>>>
>>>
>>> Sorry, I should have mentioned this earlier.
>>>
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>>>
>>>
>>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
>>> <jorgeivanvelez at gmail.com> wrote:
>>>>
>>>> Perhaps
>>>> which(repo%in%qr)
>>>> ?
>>>> HTH,
>>>>
>>>> Jorge
>>>>
>>>>
>>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at gmail.com>
>>>> wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>> Suppose I have the following vector as repository:
>>>>>
>>>>> > repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT")
>>>>>
>>>>> Given another query vector
>>>>>
>>>>> > qr <- c("AAC", "ATT")
>>>>>
>>>>> is there a way I can find the query index in repository in a fast way.
>>>>>
>>>>> Giving:
>>>>>
>>>>> [1] 3 6
>>>>>
>>>>> Typically the size of  repo is around ~12million element, and
>>>>> query around ~1 million element.
>>>>>
>>>>>
>>>>> - Gundala Viswanath
>>>>> Jakarta - Indonesia
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list