[R] Regarding Principal Component Analysis result Interpretation

Sat Sep 16 01:40:32 CEST 2017

This list is about R programming, not statistics, although they do often
intersect. Nevertheless, this discussion seems to be all about the latter,
not the former, so I think you would do better bringing it to a statistics
list like stats.stackexchange.com rather than here.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 15, 2017 at 5:12 AM, Ismail SEZEN <sezenismail at gmail.com> wrote:

> First, see the example at https://isezen.github.io/PCA/
>
> > On 15 Sep 2017, at 13:43, Shylashree U.R <shylashivashree at gmail.com>
> wrote:
> >
> > Dear Sir/Madam,
> >
> > I am trying to do PCA analysis with "iris" dataset and trying to
> interpret
> > the result. Dataset contains 150 obs of 5 variables
> >
> >    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  Species
> >     1             5.1                    3.5                 1.4
> >    0.2             setosa
> >     2             4.9                3.0                 1.4
> > 0.2             setosa
> >     .....
> >     .....
> >    150         5.9                3.0                  5.1
> 18
> >             verginica
> >
> > now I used 'prcomp' function on dataset and got result as following:
> >> print(pc)
> > Standard deviations (1, .., p=4):
> > [1] 1.7083611 0.9560494 0.3830886 0.1439265
> >
> > Rotation (n x k) = (4 x 4):
> >                    PC1         PC2        PC3        PC4
> > Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
> > Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
> > Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
> > Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971
> >
> > I'm planning to use PCA as feature selection process and remove variables
> > which are corelated in my project, I have interpreted the PCA result, but
> > not sure is my interpretation is correct or wrong.
>
>
> You want to “remove variables which are correlated”. Correlated among
> themselves? If so, why don’t you create a pearson correlation matrix (see
> ?cor) and define a threshold and remove variables which are correlated
> according to this threshold? Perhaps I did not understand you correctly,
> excuse me.
>
> for iris dataset, each component will be as much as correlated with PC1
> and remaining part will be correlated PC2 and so on. Hence, you can
> identify which variables are similar in terms of VARIANCE. You can
> understand it if you examine the example that I gave above.
>
> In PCA, you can also calculate the correlations between variables and PCs
> but this shows you how PCs are affected by this variables. I don’t know how
> you plan to accomplish feature selection process so I hope this helps you.
> Also note that resources part at the end of example.
>
> isezen
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]