[R] Faster text search in document database than with grep?

Witold E Wolski wewolski at gmail.com
Mon Aug 3 11:25:23 CEST 2015


I have a database of text documents (letter sequences). Several thousands
of documents with approx. 1000-2000 letters each.

I need to find exact matches of short 3-15 letters sequences in those
documents.

Without any regexp patterns the search of one 3-15 letter "words" takes in
the order of 1s.

So for a database with several thousand documents it's an the order of
hours.
The naive approach would be to use mcmapply, but than on a standard
hardware I am still in the same order and since R is an interactive
programming environment this isn't a solution I would go for.

But aren't there faster algorithmic solutions? Can anyone point me please
to an implementation  available in R.

Thank you
Witold




-- 
Witold Eryk Wolski

	[[alternative HTML version deleted]]



More information about the R-help mailing list