[R] get top 50 correlated item from a correlation matrix for each item

Tan, Richard RTan at panagora.com
Thu Feb 12 18:26:49 CET 2009


Works like a charm, thank you! 

-----Original Message-----
From: Dimitris Rizopoulos [mailto:d.rizopoulos at erasmusmc.nl] 
Sent: Thursday, February 12, 2009 12:11 PM
To: Tan, Richard
Cc: r-help at r-project.org
Subject: Re: [R] get top 50 correlated item from a correlation matrix
for each item

a possible vectorized solution is the following:

cor.mat <- cor(matrix(rnorm(100*1000), 1000, 100)) p <- 30 # how many
top items

n <- ncol(cor.mat)
cmat <- col(cor.mat)
ind <- order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n)
dim(ind) <- dim(cor.mat)
ind <- ind[seq(2, p + 1), ]
out <- cbind(ID = c(col(ind)), ID2 = c(ind)) as.data.frame(cbind(out,
cor = cor.mat[out]))


I hope it helps.

Best,
Dimitris


Tan, Richard wrote:
> Hi,
>  
> I have a correlation matrix of about 3000 items, i.e., a 3000*3000 
> matrix.  For each of the 3000 items, I want to get the top 50 items 
> that have the highest correlation with it (excluding itself) and 
> generate a data frame with 3 columns like ("ID", "ID2", "cor"), where 
> ID is those 3000 items each repeat 50 times, and ID2 is the top 50 
> correlated items with ID, and cor is the correlation of ID and ID2.  I

> know I can use two for loops to do it but it is very time consuming 
> considering the correlation matrix is generated for each month of the 
> past 20 years.  Is there a better way to do it?
>  
> Regards,
>  
> Richard
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014




More information about the R-help mailing list