[R] find high correlated variables in a big matrix
David Winsemius
dwinsemius at comcast.net
Fri May 6 23:32:20 CEST 2016
> On May 6, 2016, at 2:12 PM, Lida Zeighami <lid.zigh at gmail.com> wrote:
>
> Hi there,
>
> Is there any way to find out high correlated variables among a big matrix?
> for example I have a matrix called data= 2000*5000 and I need to find the
> high correlated variables between the variables in the columns! (Need 100
> high correlated variables from 5000 variables in column)
>
> I could calculate the correlation matrix and pick the high correlated ones
> but my problem is, I just can pick pairs of variables with high correlation
> and may be we have low correlation across the pairs! Means, in my 100*100
> correlation matrix, there are some pairs with low correlation and I
> couldn't find the 100 variables which they all have high correlation
> together!!!
> Would you please ley me know if there is any way?
The rcorr function in Hmisc will return a list whose first element is a correlation matrix
> base <- rnorm(100)
> test <- matrix(base+0.2*rnorm(300), 100)
> rcorr(test)[[1]]
[,1] [,2] [,3]
[1,] 1.0000000 0.9631220 0.9721688
[2,] 0.9631220 1.0000000 0.9666564
[3,] 0.9721688 0.9666564 1.0000000
You can use which to to find the locations meeting a criterion (or two):
> mycorr <- .Last.value
> which(mycorr > 0.97 & mycorr != 1, arr.ind=TRUE)
row col
[1,] 3 1
[2,] 1 3
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list