[R] bad performance of a function

Peter Dalgaard p.dalgaard at biostat.ku.dk
Fri Nov 14 14:22:54 CET 2003


Roger Bivand <Roger.Bivand at nhh.no> writes:

> > rlex$lengths[rlex$values]
>  [1] 1 3 2 5 1 4 1 1 1 3 1 1 2
> > cetnost
>  [1] 1 3 2 5 1 4 1 1 1 3 1 1 2
> 
> rle() is interpreted too, like your solution, so I'm not sure how it will 
> scale.

Not spectacularly better, but I don't think Peter is doing what he
thinks he's doing...

> > 
> > Example 2
> > x<-sample(c(T,F),40321*51, replace=T)
> > dd<-matrix(x,40321,51)
> > system.time(cetnost <- lapply(dd,function(x) as.numeric(table(which(x)-
> > cumsum(x[which(x)])))))
> > Timing stopped at: 750.63 1 775.6 NA NA 

dd is not a list or data frame, so lapply is doing something for each
of the 2 million cells. Was this intended instead:

> system.time(cetnost <- apply(dd,2,function(x) as.numeric(table(which(x)-
+ cumsum(x[which(x)])))))
[1]  8.45  0.10 13.84  0.00  0.00

rle() helps a bit but not orders of magnitude:

> system.time(cetnost <- apply(dd,2,function(x) ((z <- rle(x))$lengths)[z$values]))
[1] 2.88 0.03 5.32 0.00 0.00

(This problem has a memory foot print of more than 200MB, so total
timings vary wildly depending on whether swapping occurs.)

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list