[R] Remove highly correlated variables from a data frame or matrix

Ana Marija @okov|c@@n@m@r|j@ @end|ng |rom gm@||@com
Thu Nov 14 21:42:19 CET 2019


it can be converted between data frame and matrix. I am attaching here
the whole file for examination

I basically want to remove all entries for pairs which have value in
between them (correlation calculated not in R, bit it is correlation,
r2)
so for example I would not keep: rs883504 because it has r2>0.8 for
all those rs...

                  rs8069610 rs883504 rs8072394 rs4280293 rs4465638 rs12602378
rs56192520      0.582    0.903     0.582     0.582     0.811      0.302
rs3764410       0.598    0.928     0.598     0.598     0.836      0.311
rs145984817     0.638    0.975     0.638     0.638     0.879      0.344
rs1807401       0.638    0.975     0.638     0.638     0.879      0.344
rs1807402       0.638    0.975     0.638     0.638     0.879      0.344
rs35350506      0.638    0.975     0.638     0.638     0.879      0.344


On Thu, Nov 14, 2019 at 2:29 PM Abby Spurdle <spurdle.a using gmail.com> wrote:
>
> Sorry, but I don't understand your question.
>
> When I first looked at this, I thought it was a correlation (or
> covariance) matrix.
> e.g.
>
> > cor (quakes)
> > cov (quakes)
>
> However, your  row and column variables are different, implying two
> different data sets.
> Also, some of the (correlation?) coefficients are the same, implying
> that some of the variables are the same, or very close.
>
> Also, note that a matrix is not a data.frame.
>
>
> > I have a data frame like this (a matrix):
> > head(calc.rho)
> >             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> > rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> > rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> > rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> > rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> > rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> > rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> > > dim(calc.rho)
> > [1] 246 246
> > I would like to remove from this data all highly correlated variables,
> > with correlation more than 0.8

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ro246_matrix.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20191114/2577162a/attachment.txt>


More information about the R-help mailing list