[R] get top 50 correlated item from a correlation matrix for each item

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Thu Feb 12 18:10:37 CET 2009


a possible vectorized solution is the following:

cor.mat <- cor(matrix(rnorm(100*1000), 1000, 100))
p <- 30 # how many top items

n <- ncol(cor.mat)
cmat <- col(cor.mat)
ind <- order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n)
dim(ind) <- dim(cor.mat)
ind <- ind[seq(2, p + 1), ]
out <- cbind(ID = c(col(ind)), ID2 = c(ind))
as.data.frame(cbind(out, cor = cor.mat[out]))


I hope it helps.

Best,
Dimitris


Tan, Richard wrote:
> Hi,
>  
> I have a correlation matrix of about 3000 items, i.e., a 3000*3000
> matrix.  For each of the 3000 items, I want to get the top 50 items that
> have the highest correlation with it (excluding itself) and generate a
> data frame with 3 columns like ("ID", "ID2", "cor"), where ID is those
> 3000 items each repeat 50 times, and ID2 is the top 50 correlated items
> with ID, and cor is the correlation of ID and ID2.  I know I can use two
> for loops to do it but it is very time consuming considering the
> correlation matrix is generated for each month of the past 20 years.  Is
> there a better way to do it?
>  
> Regards,
>  
> Richard 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014




More information about the R-help mailing list