[R] tapply huge speed difference if X has names

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Aug 8 21:36:17 CEST 2005


Please use a current version of R!

This was fixed long ago, and you will find it in the NEWS file:

         split() now handles vectors with names internally and so is
         almost as fast as on vectors without names (and maybe 100x
         faster than before).


On Mon, 8 Aug 2005, Matthew Dowle wrote:

>
> Hi all,
>
> Apologies if this has been raised before ... R's tapply is very fast, but if
> X has names in this example, there seems to be a huge slow down: under 1
> second compared to 151 seconds.  The following timings are repeatable and
> are timed properly on a single user machine :
>
>> X = 1:100000
>> names(X) = X
>> system.time(fast<<-tapply(as.vector(X), rep(1:10000,each=10), mean))	#
> as.vector() to drop the names
> [1] 0.36 0.00 0.35 0.00 0.00
>> system.time(slow<<-tapply(X, rep(1:10000,each=10), mean))
> [1] 149.95   1.83 151.79   0.00   0.00
>> head(fast)
>   1    2    3    4    5    6
> 5.5 15.5 25.5 35.5 45.5 55.5
>> head(slow)
>   1    2    3    4    5    6
> 5.5 15.5 25.5 35.5 45.5 55.5
>> identical(fast,slow)
> [1] TRUE
>>
>
> Looking inside tapply, which then calls split, it seems there is an
> is.null(names(x)) which prevents R's internal fast version from being
> called. Why is that there? Could it be removed?  I often do something like
> tapply(mat[,"colname"],...) where mat has rownames. Therefore the rownames
> of mat become the names of the vector mat[,"colname"], and this seems to
> slow down tapply a lot. Perhaps other functions which call split also suffer
> this problem?
>
>> split.default
> function (x, f)
> {
>    if (is.list(f))
>        f <- interaction(f)
>    f <- factor(f)
>    if (is.null(attr(x, "class")) && is.null(names(x)))
>        return(.Internal(split(x, f)))
>    lf <- levels(f)
>    y <- vector("list", length(lf))
>    names(y) <- lf
>    for (k in lf) y[[k]] <- x[f %in% k]
>    y
> }
> <environment: namespace:base>
>>
>
>> version
>         _
> platform x86_64-redhat-linux-gnu
> arch     x86_64
> os       linux-gnu
> system   x86_64, linux-gnu
> status
> major    2
> minor    0.1
> year     2004
> month    11
> day      15
> language R
>>
>
>
> Thanks and regards,
> Matthew
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list