[R] any other fast method for median calculation

Matthias Kohl Matthias.Kohl at stamats.de
Tue Apr 14 11:40:29 CEST 2009


there is function rowMedians in Bioconductor package Biobase which works 
for numeric matrices and might help.
Matthias

Dimitris Rizopoulos wrote:
> S Ellison wrote:
>> Sorting with an appropriate algorithm is nlog(n), so it's very hard to
>> get the 'exact' median any faster. However, if you can cope with a less
>> precise median, you could use a binary search between max(x) and min(x)
>> with low tolerance or comparatively few iterations. In native R, though,
>> that isn;t going to be fast; interpreter overhead will likely more than
>> wipe out any reduction in number of comparisons.
>>
>> In any case, it looks like you are not constrained by the median
>> algorithm, but by the number of calls. You might do a lot better with
>> apply, though
>>> apply(df,2,median)
>
> well, for data frames, I think sapply(...) or even unlist(lapply(...)) 
> will be faster, e.g.,
>
> mat <- matrix(rnorm(50*2e05), 50, 2e05)
> DF <- as.data.frame(mat)
>
> invisible({gc(); gc()})
> system.time(apply(DF, 2, median))
>
> invisible({gc(); gc()})
> system.time(sapply(DF, median))
>
> invisible({gc(); gc()})
> system.time(unlist(lapply(DF, median), use.names = FALSE))
>
>
> Best,
> Dimitris
>
>
>> On my system 200k columns were processed in negligible time by apply
>> and I'm still waiting for mapply.
>>
>> S
>>
>>
>>
>>>>> "Zheng, Xin (NIH) [C]" <zhengxin at mail.nih.gov> 14/04/2009 05:29:40
>>>>>
>> Hi there,
>>
>> I got a data frame with more than 200k columns. How could I get median
>> of each column fast? mapply is the fastest function I know for that,
>> it's not yet satisfied though.
>> It seems function "median" in R calculates median by "sort" and "mean".
>> I am wondering if there is another function with better algorithm.
>>
>> Any hint?
>>
>> Thanks,
>>
>> Xin Zheng
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the 
>> posting guide
>> http://www.R-project.org/posting-guide.html and provide commented, 
>> minimal, self-contained, reproducible code.
>>
>> *******************************************************************
>> This email and any attachments are confidential. Any use...{{dropped:8}}
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

-- 
Dr. Matthias Kohl
www.stamats.de




More information about the R-help mailing list