[R] Compact Patricia Trees (Tries)

Richard Liu richard.liu at pueo-owl.ch
Thu Apr 29 16:22:12 CEST 2010


Gabor,

Thanks for the suggestion, I'll try it out tonight or tomorrow.

Regards,
Richard

_____________________
Richard R. Liu
Dittingerstr. 33
CH-4053 Basel
Switzerland
Tel. +41 79 708 67 66

Sent from my iPhone 3GS

On Apr 29, 2010, at 13:06, Gabor Grothendieck  
<ggrothendieck at gmail.com> wrote:

> Using charmatch partial matches of 10,000 5 leters words to the same
> list can be done in 10 seconds on my machine and 10,000 5 letter words
> to 100,000 10 letter words in 1 minute.  Is that good enough?  Try
> this simulation:
>
> # generate N random words each k long
> rwords <- function(N, k) {
>   L <- sample(letters, N*k, replace = TRUE)
>   apply(matrix(L, k), 2, paste, collapse = "")
> }
> w1 <- rwords(1e5, 10)
> w2 <- rwords(1e4, 5)
>
> system.time(charmatch(w2, w2))
>
> system.time(charmatch(w2, w1))
>
>
> On Thu, Apr 29, 2010 at 4:05 AM, Richard R. Liu <richard.liu at pueo-owl.ch 
> > wrote:
>> I have an application that a long list of character strings to  
>> determine which
>> occur at the beginning of a given word.  A straight forward R  
>> script using grep
>> takes a long time to run.  Rewriting it to use substr and match  
>> might be an
>> option, but I have the impression that preparing the list as a trie  
>> and
>> performing trie searches might lead to dramatic improvements in  
>> performance.
>>
>>
>> I have searched the CRAN packages and find no packages that support  
>> Compact
>> Patricia Trees.  Does anybody know of such?
>>
>>
>> Thanks,
>> Richard
>>
>> Richard R. Liu
>> richard.liu at pueo-owl.ch
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list