[R] Fast way to finding index in Vector

Gundala Viswanath gundalav at gmail.com
Tue Jan 13 05:33:01 CET 2009


Thanks for the info, Jim.

- GV



On Tue, Jan 13, 2009 at 12:27 PM, jim holtman <jholtman at gmail.com> wrote:
> Is this fast enough for you; matches of 2000 against 2M tags takes 0.2 seconds:
>
>> str(x)
>  chr [1:2000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" "DADAA"
> "ABCAD" ...
>> str(z)
>  chr [1:2000000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC"
> "DADAA" "ABCAD" ...
>> system.time(y <- match(x,z))
>   user  system elapsed
>    0.2     0.0     0.2
>> str(y)
>  int [1:2000] 1 2 3 4 5 6 7 8 9 10 ...
>>
>
>
>
> On Mon, Jan 12, 2009 at 10:17 PM, Gundala Viswanath <gundalav at gmail.com> wrote:
>> Yes Jim, exactly.
>>
>> BTW, I found from ?match
>>
>> " Matching for lists is potentially very slow and best avoided
>>     except in simple cases."
>>
>> Since I am doing this for million of tags. Is there a faster alternatives?
>>
>>
>> - Gundala Viswanath
>> Jakarta - Indonesia
>>
>>
>>
>> On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholtman at gmail.com> wrote:
>>> Is this what you want:
>>>
>>>> repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT")
>>>> qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT")
>>>> match(qr, repo)
>>> [1] 3 6 6 3 6 6 2 6 6
>>>>
>>>
>>>
>>>
>>> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at gmail.com> wrote:
>>>> Hi Jorge and all,
>>>>
>>>> How can I modified your code when
>>>>
>>>> query size can be bigger than repository,
>>>> meaning that it can contain repeats.
>>>>
>>>> e.g. qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT",  )
>>>>
>>>>
>>>> Sorry, I should have mentioned this earlier.
>>>>
>>>>
>>>> - Gundala Viswanath
>>>> Jakarta - Indonesia
>>>>
>>>>
>>>>
>>>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
>>>> <jorgeivanvelez at gmail.com> wrote:
>>>>>
>>>>> Perhaps
>>>>> which(repo%in%qr)
>>>>> ?
>>>>> HTH,
>>>>>
>>>>> Jorge
>>>>>
>>>>>
>>>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> Suppose I have the following vector as repository:
>>>>>>
>>>>>> > repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT")
>>>>>>
>>>>>> Given another query vector
>>>>>>
>>>>>> > qr <- c("AAC", "ATT")
>>>>>>
>>>>>> is there a way I can find the query index in repository in a fast way.
>>>>>>
>>>>>> Giving:
>>>>>>
>>>>>> [1] 3 6
>>>>>>
>>>>>> Typically the size of  repo is around ~12million element, and
>>>>>> query around ~1 million element.
>>>>>>
>>>>>>
>>>>>> - Gundala Viswanath
>>>>>> Jakarta - Indonesia
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>>
>>> What is the problem that you are trying to solve?
>>>
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>




More information about the R-help mailing list