[R] find high correlated variables in a big matrix

David L Carlson dcarlson at tamu.edu
Tue May 10 18:46:33 CEST 2016


Look at varclus() in package Hmisc or package ClustOfVar.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Lida Zeighami
Sent: Tuesday, May 10, 2016 11:30 AM
To: David Winsemius; clint at ecy.wa.gov
Cc: r-help
Subject: Re: [R] find high correlated variables in a big matrix

Thank you David for your reply,

But still couldn't get my answer.
I've already used the rcorr and created the correlation matrix and found
the high correlated variables but just among the two variables, it means I
could find the pairs of variables with high correlation.
So I couldn't get for example 100 variables that all of them are high
correlated together.

Dear Clint, I think you are right! It's better to tell that I'm  trying to
find clusters of variables according to some distance metric! would you
please let me know how I can solve it?

Thanks


On Fri, May 6, 2016 at 4:32 PM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> > On May 6, 2016, at 2:12 PM, Lida Zeighami <lid.zigh at gmail.com> wrote:
> >
> > Hi there,
> >
> > Is there any way to find out high correlated variables among a big
> matrix?
> > for example I have a matrix called data= 2000*5000 and I need to find the
> > high correlated variables between the variables in the columns! (Need 100
> > high correlated variables from 5000 variables in column)
> >
> > I could calculate the correlation matrix and pick the high correlated
> ones
> > but my problem is, I just can pick pairs of variables with high
> correlation
> > and may be we have low correlation across the pairs! Means, in my 100*100
> > correlation matrix, there are some pairs with low correlation and I
> > couldn't find the 100 variables which they all have high correlation
> > together!!!
> > Would you please ley me know if there is any way?
>
> The rcorr function in Hmisc will return a list whose first element is a
> correlation matrix
>
> > base <- rnorm(100)
>
> > test <- matrix(base+0.2*rnorm(300), 100)
>
> > rcorr(test)[[1]]
>           [,1]      [,2]      [,3]
> [1,] 1.0000000 0.9631220 0.9721688
> [2,] 0.9631220 1.0000000 0.9666564
> [3,] 0.9721688 0.9666564 1.0000000
>
> You can use which to to find the locations meeting a criterion (or two):
>
> > mycorr <- .Last.value
>
> > which(mycorr > 0.97 & mycorr != 1, arr.ind=TRUE)
>      row col
> [1,]   3   1
> [2,]   1   3
>
>
>
> --
>
> David Winsemius
> Alameda, CA, USA
>
>

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list