[R] Find the 50 highest values in a matrix

Henrik Bengtsson hb at stat.berkeley.edu
Fri Jun 18 13:39:36 CEST 2010


You might also want to consider _partial sorting_ by using the
'partial' argument of sort(), especially when the number of data
points is really large.

Since argument 'decreasing=FALSE' is not supported when using
'partial', you have to flip it yourself by negating the values, e.g.

x <- rnorm(8e6);
is.na(x) <- sample(length(x), size=1e6);

n <- 50;
t1 <- system.time({
  x1 <- sort(x, decreasing=TRUE);
  x1h <- x1[1:n];
});

t2 <- system.time({
  x2 <- sort(-x, partial=n);
  x2h <- -sort(x2[1:n]);
});

stopifnot(identical(x2h, x1h));
print(t2/t1);
     user    system   elapsed
0.3076923 0.7777778 0.3491525

/Henrik

On Fri, Jun 18, 2010 at 1:20 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
>
> m <- matrix(round(rnorm(4000 * 2000), 4), nr = 4000)
> is.na(m) <- sample(8e6, 1e6)
>
> system.time(
>  idx <- which(
>    matrix(m %in% head(sort(m, TRUE), 50),
>           nr = nrow(m)), arr.ind = TRUE))
>
> #   user  system elapsed
> #   3.12    0.19    3.18
>
>  -Peter Ehlers
>
>
> On 2010-06-18 5:13, Dennis Murphy wrote:
>>
>> Hi:
>>
>> Here's a faked up example:
>>
>> a<- matrix(rnorm(4000*2000), 4000, 2000)
>> # Generate some NAs in the matrix
>> nr<- sample(50, 1:4000)
>> nc<- sample(50, 1:2000)
>> a[nr, nc]<- NA
>>
>> # convert to data frame:
>> b<- data.frame(row = rep(1:4000, 2000), col = rep(1:2000, each = 4000),
>>                           x = as.vector(a))
>> # relatively time consuming...about 13.5 s on my machine
>> bb<- b[rev(order(b$x, na.last = FALSE)), ]
>>>
>>> bb[1:10, ]
>>
>>          row  col        x
>> 691269  3269  173 5.103704
>> 7815076 3076 1954 4.961544
>> 4999621 3621 1250 4.953265
>> 500469   469  126 4.937655
>> 5878224 2224 1470 4.929150
>> 4287270 3270 1072 4.913791
>> 4442521 2521 1111 4.896869
>> 4668867  867 1168 4.863504
>> 5716575  575 1430 4.760778
>> 3055274 3274  764 4.758995
>>
>> HTH,
>> Dennis
>>
>>
>> On Thu, Jun 17, 2010 at 10:41 PM,
>> uschlecht<ulrich.schlecht at stanford.edu>wrote:
>>
>>>
>>> Hi,
>>>
>>> I have a huge matrix (4000 * 2000 data points) and I would like to
>>> retrieve
>>> the coordinates (column and row) for the top 50 (or x) values. Some
>>> positions in the matrix have NA as a value. These should be discarded.
>>>
>>> My current method is to replace all NAs by 0, then rank all the values
>>> and
>>> then extract the positions with the 50 highest ranks. It is very
>>> time-consuming!
>>>
>>> Is there a simpler way to do this?
>>>
>>> Thank you,
>>> Ulrich
>>>
>>> --
>>> View this message in context:
>>>
>>> http://r.789695.n4.nabble.com/Find-the-50-highest-values-in-a-matrix-tp2259721p2259721.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list