[R] find high correlated variables in a big matrix
Lida Zeighami
lid.zigh at gmail.com
Tue May 10 18:30:22 CEST 2016
Thank you David for your reply,
But still couldn't get my answer.
I've already used the rcorr and created the correlation matrix and found
the high correlated variables but just among the two variables, it means I
could find the pairs of variables with high correlation.
So I couldn't get for example 100 variables that all of them are high
correlated together.
Dear Clint, I think you are right! It's better to tell that I'm trying to
find clusters of variables according to some distance metric! would you
please let me know how I can solve it?
Thanks
On Fri, May 6, 2016 at 4:32 PM, David Winsemius <dwinsemius at comcast.net>
wrote:
>
> > On May 6, 2016, at 2:12 PM, Lida Zeighami <lid.zigh at gmail.com> wrote:
> >
> > Hi there,
> >
> > Is there any way to find out high correlated variables among a big
> matrix?
> > for example I have a matrix called data= 2000*5000 and I need to find the
> > high correlated variables between the variables in the columns! (Need 100
> > high correlated variables from 5000 variables in column)
> >
> > I could calculate the correlation matrix and pick the high correlated
> ones
> > but my problem is, I just can pick pairs of variables with high
> correlation
> > and may be we have low correlation across the pairs! Means, in my 100*100
> > correlation matrix, there are some pairs with low correlation and I
> > couldn't find the 100 variables which they all have high correlation
> > together!!!
> > Would you please ley me know if there is any way?
>
> The rcorr function in Hmisc will return a list whose first element is a
> correlation matrix
>
> > base <- rnorm(100)
>
> > test <- matrix(base+0.2*rnorm(300), 100)
>
> > rcorr(test)[[1]]
> [,1] [,2] [,3]
> [1,] 1.0000000 0.9631220 0.9721688
> [2,] 0.9631220 1.0000000 0.9666564
> [3,] 0.9721688 0.9666564 1.0000000
>
> You can use which to to find the locations meeting a criterion (or two):
>
> > mycorr <- .Last.value
>
> > which(mycorr > 0.97 & mycorr != 1, arr.ind=TRUE)
> row col
> [1,] 3 1
> [2,] 1 3
>
>
>
> --
>
> David Winsemius
> Alameda, CA, USA
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list