[R] any other fast method for median calculation

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Tue Apr 14 11:34:04 CEST 2009


S Ellison wrote:
> Sorting with an appropriate algorithm is nlog(n), so it's very hard to
> get the 'exact' median any faster. However, if you can cope with a less
> precise median, you could use a binary search between max(x) and min(x)
> with low tolerance or comparatively few iterations. In native R, though,
> that isn;t going to be fast; interpreter overhead will likely more than
> wipe out any reduction in number of comparisons.
> 
> In any case, it looks like you are not constrained by the median
> algorithm, but by the number of calls. You might do a lot better with
> apply, though 
>> apply(df,2,median)

well, for data frames, I think sapply(...) or even unlist(lapply(...)) 
will be faster, e.g.,

mat <- matrix(rnorm(50*2e05), 50, 2e05)
DF <- as.data.frame(mat)

invisible({gc(); gc()})
system.time(apply(DF, 2, median))

invisible({gc(); gc()})
system.time(sapply(DF, median))

invisible({gc(); gc()})
system.time(unlist(lapply(DF, median), use.names = FALSE))


Best,
Dimitris


> On my system 200k columns were processed in negligible time by apply
> and I'm still waiting for mapply.
> 
> S
> 
> 
> 
>>>> "Zheng, Xin (NIH) [C]" <zhengxin at mail.nih.gov> 14/04/2009 05:29:40
>>>>
> Hi there,
> 
> I got a data frame with more than 200k columns. How could I get median
> of each column fast? mapply is the fastest function I know for that,
> it's not yet satisfied though. 
> 
> It seems function "median" in R calculates median by "sort" and "mean".
> I am wondering if there is another function with better algorithm.
> 
> Any hint?
> 
> Thanks,
> 
> Xin Zheng
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
> 
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014




More information about the R-help mailing list