[R] Largest N Values Efficiently?

David Katz david at davidkatzconsulting.com
Mon Nov 12 17:36:36 CET 2007


x is a 1XN sparse matrix of numerics. I am using the Matrix package to
represent as a sparse matrix; the representation has a numeric vector
representing the positions within the matrix. My goal is find the columns
with the n largest values, here positive  correlations. Part of my strategy
is to only sort the nonzeros which are available as a numeric vector. 

Thanks for your interest and input.



Prof Brian Ripley wrote:
> 
> What is 'x' here?  What type?  Does it contain NAs?  Are there ties?  R's 
> ordering functions are rather general, and you can gain efficiency by 
> ruling some of these out.
> 
> See ?sort, look at the 'partial' argument, including the comments in the 
> Details.  And also look at ?sort.list.
> 
> sort.int(x) is more efficient than x[order(x)], and x[order(x)[1:n]] is 
> more efficient than x[order(x)][1:n] for most types.
> 
> Finally, does efficiency matter?  As the examples in ?sort show, R can 
> sort a vector of length 2000 is well under 1ms, and 1e7 random normals in 
> less time than they take to generate.  There are not many tasks where 
> gaining efficiency over x[order(x)][1:n] will be important.  E.g.
> 
>> system.time(x <- rnorm(1e6))
>     user  system elapsed
>     0.44    0.00    0.44
>> system.time(x[order(x)][1:4])
>     user  system elapsed
>     1.72    0.00    1.72
>> system.time(x2 <- sort.int(x, method = "quick")[1:4])
>     user  system elapsed
>     0.31    0.00    0.32
>> system.time(min(x))
>     user  system elapsed
>     0.02    0.00    0.02
>> system.time(x2 <- sort.int(x, partial=1)[1])
>     user  system elapsed
>     0.07    0.00    0.07
> 
> and do savings of tenths of a second matter?  (There is also 
> quantreg::kselect, if you work out how to use it, which apparently is 
> a bit faster at partial sorting on MacOS X but not elsewhere.)
> 
> 
> On Sun, 11 Nov 2007, David Katz wrote:
> 
>>
>> What is the most efficient alternative to x[order(x)][1:n] where
>> length(x)>>n?
> 
> That is the smallest n values, pace your subject line.
> 
>> I also need the positions of the mins/maxs perhaps by preserving names.
>>
>> Thanks for any suggestions.
>>
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13708965
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list