[R] Remove highly correlated variables from a data frame or matrix

Abby Spurdle @purd|e@@ @end|ng |rom gm@||@com
Thu Nov 14 21:56:38 CET 2019


> I basically want to remove all entries for pairs which have value in
> between them (correlation calculated not in R, bit it is correlation,
> r2)
> so for example I would not keep: rs883504 because it has r2>0.8 for
> all those rs...

I'm still not sure what "remove all entries" means?
In your example rs883504, has all correlation coefficients > 0.8, in
the data returned by head().
However, most of its correlation coefficients are < 0.8, if you
include the entire matrix.

If you remove a variable that has at least one correlation coefficient
> 0.8, you would remove all the variables.
However, if you remove a variable that has all correlation
coefficients > 0.8, you would (probably) remove no variables.



More information about the R-help mailing list