tabulate

Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
25 Jan 2000 00:24:22 +0100


Bill Venables <William.Venables@cmis.CSIRO.AU> writes:

> OK Peter.  This is the first one I cooked up:
...
> > m <- rpois(100000, 1)
> > tabulate(m)
> [1] 36891 18399  6064  1519   309    50     4     1
> > table(m)
> m
>     0     1     2     3     4     5     6     7     8 
> 36763 36891 18399  6064  1519   309    50     4     1 
> > system.time(tabulate(m))
> [1] 0.11 0.00 0.00 0.00 0.00
> > system.time(table(m))
> [1] 2.90 0.16 4.00 0.00 0.00
> > version

OK first, notice that I get:

> system.time(table(m))
[1] 3.38 0.00 3.38 0.00 0.00
> system.time(f<-factor(m))
[1] 2.12 0.00 2.12 0.00 0.00
> system.time(table(f))
[1] 1.19 0.00 1.20 0.00 0.00

so most of the time really goes into factor(). If one is careful about
the innards of table() one can shave the time for that to 

> system.time(tab2(f))
[1] 0.66 0.01 0.67 0.00 0.00

Rather interestingly, the non constant time part of table would seem
equivalent to 

> system.time(as.integer(0)+as.integer(1)*(as.integer(f)-as.integer(1)))
[1] 0.25 0.00 0.25 0.00 0.00
> system.time(as.integer(0)+as.integer(1)*(as.integer(f)-as.integer(1)))
[1] 0.07 0.00 0.07 0.00 0.00

Notice the huge difference in the two executions, indicating that the
number of garbage collections involved probably play a major role.

On the whole it doesn't really seem to be worth it to obtimize this
very heavily, but if you have any obvious improvements for factor()...

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._