[R] Principle Component Analysis: Ranking Animal Size Based On Combined Metrics

Bert Gunter bgunter.4567 at gmail.com
Sun Nov 13 16:25:03 CET 2016


While you may get a reply here, this list is about R programming, not
about statistics. So

1. Do your homework and read a tutorial on PCA on the web or
elsewhere. Isn't this what a PhD student is supposed to do?

2. Post on a statistics list like stats.stackexchange.com.

3. Consult your professor or other local statistical resource.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Nov 12, 2016 at 9:46 PM, Sidoti, Salvatore A.
<sidoti.23 at buckeyemail.osu.edu> wrote:
> Let's say I perform 4 measurements on an animal: three are linear measurements in millimeters and the fourth is its weight in milligrams. So, we have a data set with mixed units.
>
> Based on these four correlated measurements, I would like to obtain one "score" or value that describes an individual animal's size. I considered simply taking the geometric mean of these 4 measurements, and that would give me a "score" - larger values would be for larger animals, etc.
>
> However, this assumes that all 4 of these measurements contribute equally to an animal's size. Of course, more than likely this is not the case. I then performed a PCA to discover how much influence each variable had on the overall data set. I was hoping to use this analysis to refine my original approach.
>
> I honestly do not know how to apply the information from the PCA to this particular problem...
>
> I do know, however, that principle components 1 and 2 capture enough of the variation to reduce the number of dimensions down to 2 (see analysis below with the original data set).
>
> Note: animal weights were ln() transformed to increase correlation with the 3 other variables.
>
> df <- data.frame(
>   weight = log(1000*c(0.0980, 0.0622, 0.0600, 0.1098, 0.0538, 0.0701, 0.1138, 0.0540, 0.0629, 0.0930,
>              0.0443, 0.1115, 0.1157, 0.0734, 0.0616, 0.0640, 0.0480, 0.1339, 0.0547, 0.0844,
>              0.0431, 0.0472, 0.0752, 0.0604, 0.0713, 0.0658, 0.0538, 0.0585, 0.0645, 0.0529,
>              0.0448, 0.0574, 0.0577, 0.0514, 0.0758, 0.0424, 0.0997, 0.0758, 0.0649, 0.0465,
>              0.0748, 0.0540, 0.0819, 0.0732, 0.0725, 0.0730, 0.0777, 0.0630, 0.0466)),
>   interoc = c(0.853, 0.865, 0.811, 0.840, 0.783, 0.868, 0.818, 0.847, 0.838, 0.799,
>               0.737, 0.788, 0.731, 0.777, 0.863, 0.877, 0.814, 0.926, 0.767, 0.746,
>               0.700, 0.768, 0.807, 0.753, 0.809, 0.788, 0.750, 0.815, 0.757, 0.737,
>               0.759, 0.863, 0.747, 0.838, 0.790, 0.676, 0.857, 0.728, 0.743, 0.870,
>               0.787, 0.773, 0.829, 0.785, 0.746, 0.834, 0.829, 0.750, 0.842),
>   cwidth = c(3.152, 3.046, 3.139, 3.181, 3.023, 3.452, 2.803, 3.050, 3.160, 3.186,
>              2.801, 2.862, 3.183, 2.770, 3.207, 3.188, 2.969, 3.033, 2.972, 3.291,
>              2.772, 2.875, 2.978, 3.094, 2.956, 2.966, 2.896, 3.149, 2.813, 2.935,
>              2.839, 3.152, 2.984, 3.037, 2.888, 2.723, 3.342, 2.562, 2.827, 2.909,
>              3.093, 2.990, 3.097, 2.751, 2.877, 2.901, 2.895, 2.721, 2.942),
>   clength = c(3.889, 3.733, 3.762, 4.059, 3.911, 3.822, 3.768, 3.814, 3.721, 3.794,
>               3.483, 3.863, 3.856, 3.457, 3.996, 3.876, 3.642, 3.978, 3.534, 3.967,
>               3.429, 3.518, 3.766, 3.755, 3.706, 3.785, 3.607, 3.922, 3.453, 3.589,
>               3.508, 3.861, 3.706, 3.593, 3.570, 3.341, 3.916, 3.336, 3.504, 3.688,
>               3.735, 3.724, 3.860, 3.405, 3.493, 3.586, 3.545, 3.443, 3.640))
>
> pca_morpho <- princomp(df, cor = TRUE)
>
> summary(pca_morpho)
>
> Importance of components:
>                                         Comp.1          Comp.2          Comp.3          Comp.4
> Standard deviation      1.604107        0.8827323       0.7061206       0.3860275
> Proportion of Variance  0.643290        0.1948041       0.1246516       0.0372543
> Cumulative Proportion   0.643290        0.8380941       0.9627457       1.0000000
>
> Loadings:
>                         Comp.1  Comp.2  Comp.3  Comp.4
> weight          -0.371          0.907                           -0.201
> interoc         -0.486  -0.227  -0.840
> cwidth          -0.537  -0.349          0.466           -0.611
> clength         -0.582                          0.278   0.761
>
>                         Comp.1  Comp.2  Comp.3  Comp.4
> SS loadings             1.00            1.00            1.00            1.00
> Proportion Var          0.25            0.25            0.25            0.25
> Cumulative Var          0.25            0.50            0.75            1.00
>
> Any guidance will be greatly appreciated!
>
> Salvatore A. Sidoti
> PhD Student
> The Ohio State University
> Behavioral Ecology
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list