[Rd] Performance issue in stats:::weighted.mean.default method

Tadeáš Palusga tadeas at palusga.cz
Thu Mar 5 20:54:49 CET 2015


Oops, such an amateur mistake. Thanks a lot for your quick response.

Regards

TP

On 03/05/2015 06:49 PM, Prof Brian Ripley wrote:
> On 05/03/2015 14:55, Tadeáš Palusga wrote:
>> Hi,
>>    I'm using this mailing list for the first time and I hope this is the
>> right one. I don't think that the following is a bug but it can be a
>> performance issue.
>>
>>     By my opinion, there is no need to filter by [w != 0] in last sum of
>> weighted.mean.default method defined in
>> src/library/stats/R/weighted.mean.R. There is no need to do it because
>> you can always sum zero numbers and filtering is too expensive (see
>> following benchmark snippet)
>
> But 0*x is not necessarily 0, so there is a need to do it ... see
>
> > w <- c(0, 1)
> > x <- c(Inf, 1)
> > weighted.mean(x, w)
> [1] 1
> > fun.new(x, w)
> [1] NaN
>
>>
>>
>>
>> library(microbenchmark)
>> x <- sample(500,5000,replace=TRUE)
>> w <- sample(1000,5000,replace=TRUE)/1000 *
>> ifelse((sample(10,5000,replace=TRUE) -1) > 0, 1, 0)
>> fun.new <- function(x,w) {sum(x*w)/sum(w)}
>> fun.orig  <- function(x,w) {sum(x*w[w!=0])/sum(w)}
>> print(microbenchmark(
>>    ORIGFN = fun.orig(x,w),
>>    NEWFN  = fun.new(x,w),
>>    times = 1000))
>>
>> #results:
>> #Unit: microseconds
>> #   expr     min       lq      mean  median      uq      max neval
>> # ORIGFN 190.889 194.6590 210.08952 198.847 202.928 1779.789 1000
>> #  NEWFN  20.857  21.7175  24.61149  22.080  22.594 1744.014 1000
>>
>>
>>
>>
>> So my suggestion is to remove the w != check
>>
>>
>>
>>
>> Index: weighted.mean.R
>> ===================================================================
>> --- weighted.mean.R     (revision 67941)
>> +++ weighted.mean.R     (working copy)
>> @@ -29,7 +29,7 @@
>>           stop("'x' and 'w' must have the same length")
>>       w <- as.double(w) # avoid overflow in sum for integer weights.
>>       if (na.rm) { i <- !is.na(x); w <- w[i]; x <- x[i] }
>> -    sum((x*w)[w != 0])/sum(w) # --> NaN in empty case
>> +    sum(x*w)/sum(w) # --> NaN in empty case
>>   }
>>
>>   ## see note for ?mean.Date
>>
>>
>> I hope i'm not missing something - I really don't see the reason to have
>> this filtration here.
>>
>> BR
>>
>> Tadeas 'donarus' Palusga
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list