[R] counting the occurrences of vectors

Marc Schwartz MSchwartz at MedAnalytics.com
Tue Jul 6 14:54:50 CEST 2004


On Mon, 2004-07-05 at 23:22, Gabor Grothendieck wrote:
> Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:
> 
> > the likely overhead involved in paste()ing together the rows
> > to create objects 
> 
> 
> I thought I would check this and it seems that in my original f1 function 
> its not really the paste itself that's the bottleneck but applying the 
> paste.  If we use do.call rather than apply, as shown in f1a below, then 
> we see that f1a runs faster than row.match.count (which in turn was faster
> than f1):
> 
> f1a <- function(a,b,sep=":") {
> 	f <- function(...) paste(..., sep=sep)
> 	a2 <- do.call("f", as.data.frame(a))
> 	b2 <- do.call("f", as.data.frame(b))
> 	c(table(c(b2,unique(a2)))[a2] - 1)
> }
> 
> > set.seed(1)
> > # note that we have increased the size of the matrices from last post
> > # to better show the speed difference
> > a <- matrix(sample(3,10000,rep=T),nc=5)
> > b <- matrix(sample(3,1000,rep=T),nc=5)
> 
> > # row.match.count taken from Marc's post in this thread
> > # have put a c(...) around row.match.count to make it comparable to f1a
> > gc(); system.time(ans <- c(row.match.count(b,a)))
>          used (Mb) gc trigger (Mb)
> Ncells 436079 11.7     741108 19.8
> Vcells 130663  1.0     786432  6.0
> [1] 0.11 0.00 0.11   NA   NA
> 
> > gc(); system.time(ansf1a <- f1a(b,a))
>          used (Mb) gc trigger (Mb)
> Ncells 436080 11.7     741108 19.8
> Vcells 130669  1.0     786432  6.0
> [1] 0.04 0.00 0.04   NA   NA
> 
> > all.equal(ansf1a,ans)
> [1] TRUE


Gabor,

Well done!  I liked your approach in the prior message of getting away
from using regex. I had one of those "I could'a had a V-8" moments, when
I realized that of course the resultant table names were syntactically
correct R statements and therefore one could get away from worrying
about the data type issues and use eval(parse(...)).

The above approach is better yet, more flexible, of course more elegant
and notably faster.

Advantage Gabor...  ;-)

Best regards,

Marc




More information about the R-help mailing list