[R] Maximum number of patterns and speed in grep

mdvaan mathijsdevaan at gmail.com
Fri Jul 6 18:00:47 CEST 2012


Thanks for the quick response. I should phrase my question differently
because everything is working fine, I am just trying to find a more
efficient approach:

1. What's the maximum size of the patterns argument in grep? Can't find it
online. 
2. I am trying to match 7,700 character strings to about 10,000 vectors each
containing about 5,000 strings using grep. Is there a way to do this faster?
It is very slow. 

Thanks


Sarah Goslee wrote
> 
> Hi,
> 
> Given that you can't provide a full example, please at least provide
> str() on your data, more complete information on the problem, and
> ideally a small toy example that demonstrates precisely what you are
> doing.
> 
> For instance, you tell us that you "get an error message" but you
> never tell us what it is. Don't you think we might need to know what
> the error is to be able to diagnose and fix it?
> 
> Also, note that your "working" example simply overwrites
> array$chunk1[j] four times.
> 
> Sarah
> 
> On Fri, Jul 6, 2012 at 10:45 AM, mdvaan <mathijsdevaan@> wrote:
>> Hi,
>>
>> I am using R's grep function to find patterns in vectors of strings. The
>> number of patterns I would like to match is 7,700 (of different sizes). I
>> noticed that I get an error message when I do the following:
>>
>> data <- array()
>> for (j in 1:length(x))
>> {
>> array[j] <- length(grep(paste(patterns[1:7700], collapse = "|"),  x[j],
>> value = T))
>> }
>>
>> When I break this up into 4 chunks of patterns it works:
>>
>> data <- array()
>> for (j in 1:length(x))
>> {
>> array$chunk1[j] <- length(grep(paste(patterns[1:2500], collapse = "|"),
>> x[j], value = T))
>> array$chunk1[j] <- length(grep(paste(patterns[2501:5000], collapse =
>> "|"),
>> x[j], value = T))
>> array$chunk1[j] <- length(grep(paste(patterns[5001:7500], collapse =
>> "|"),
>> x[j], value = T))
>> array$chunk1[j] <- length(grep(paste(patterns[7501:7700], collapse =
>> "|"),
>> x[j], value = T))
>> }
>>
>> My questions: what's the maximum size of the patterns argument in grep?
>> Is
>> there a way to do this faster? It is very slow.
>>
>> Thanks.
>>
>> Math
>>
>> Sorry for not providing a reproducible example. It's a size issue which
>> makes it difficult to provide an example.
>>
> 
> 
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


--
View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613p4635626.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list