[R] Principle Component Analysis: Ranking Animal Size Based On Combined Metrics
Michael Friendly
friendly at yorku.ca
Mon Nov 14 03:10:25 CET 2016
Salvatore,
I won't comment on whether to use log weight "to increase the
correlation" -- that depends on whether that makes sense, and whether
the relationships with other variables is more nearly linear.
Try this with your pca of the correlation matrix:
biplot(pca_morpho)
You'll see that the first component is defined largely by the
large correlations among length, interoc,and cwidth
while component2 is largely determined by weight.
You should probably do some reading on PCA or get some
statistical consulting at OSU to decide what to do with this.
hope this helps
-Michael
On 11/13/16 12:46 AM, Sidoti, Salvatore A. wrote:
> Let's say I perform 4 measurements on an animal: three are linear measurements in millimeters and the fourth is its weight in milligrams. So, we have a data set with mixed units.
>
> Based on these four correlated measurements, I would like to obtain one "score" or value that describes an individual animal's size. I considered simply taking the geometric mean of these 4 measurements, and that would give me a "score" - larger values would be for larger animals, etc.
>
> However, this assumes that all 4 of these measurements contribute equally to an animal's size. Of course, more than likely this is not the case. I then performed a PCA to discover how much influence each variable had on the overall data set. I was hoping to use this analysis to refine my original approach.
>
> I honestly do not know how to apply the information from the PCA to this particular problem...
>
> I do know, however, that principle components 1 and 2 capture enough of the variation to reduce the number of dimensions down to 2 (see analysis below with the original data set).
>
> Note: animal weights were ln() transformed to increase correlation with the 3 other variables.
>
> df <- data.frame(
> weight = log(1000*c(0.0980, 0.0622, 0.0600, 0.1098, 0.0538, 0.0701, 0.1138, 0.0540, 0.0629, 0.0930,
> 0.0443, 0.1115, 0.1157, 0.0734, 0.0616, 0.0640, 0.0480, 0.1339, 0.0547, 0.0844,
> 0.0431, 0.0472, 0.0752, 0.0604, 0.0713, 0.0658, 0.0538, 0.0585, 0.0645, 0.0529,
> 0.0448, 0.0574, 0.0577, 0.0514, 0.0758, 0.0424, 0.0997, 0.0758, 0.0649, 0.0465,
> 0.0748, 0.0540, 0.0819, 0.0732, 0.0725, 0.0730, 0.0777, 0.0630, 0.0466)),
> interoc = c(0.853, 0.865, 0.811, 0.840, 0.783, 0.868, 0.818, 0.847, 0.838, 0.799,
> 0.737, 0.788, 0.731, 0.777, 0.863, 0.877, 0.814, 0.926, 0.767, 0.746,
> 0.700, 0.768, 0.807, 0.753, 0.809, 0.788, 0.750, 0.815, 0.757, 0.737,
> 0.759, 0.863, 0.747, 0.838, 0.790, 0.676, 0.857, 0.728, 0.743, 0.870,
> 0.787, 0.773, 0.829, 0.785, 0.746, 0.834, 0.829, 0.750, 0.842),
> cwidth = c(3.152, 3.046, 3.139, 3.181, 3.023, 3.452, 2.803, 3.050, 3.160, 3.186,
> 2.801, 2.862, 3.183, 2.770, 3.207, 3.188, 2.969, 3.033, 2.972, 3.291,
> 2.772, 2.875, 2.978, 3.094, 2.956, 2.966, 2.896, 3.149, 2.813, 2.935,
> 2.839, 3.152, 2.984, 3.037, 2.888, 2.723, 3.342, 2.562, 2.827, 2.909,
> 3.093, 2.990, 3.097, 2.751, 2.877, 2.901, 2.895, 2.721, 2.942),
> clength = c(3.889, 3.733, 3.762, 4.059, 3.911, 3.822, 3.768, 3.814, 3.721, 3.794,
> 3.483, 3.863, 3.856, 3.457, 3.996, 3.876, 3.642, 3.978, 3.534, 3.967,
> 3.429, 3.518, 3.766, 3.755, 3.706, 3.785, 3.607, 3.922, 3.453, 3.589,
> 3.508, 3.861, 3.706, 3.593, 3.570, 3.341, 3.916, 3.336, 3.504, 3.688,
> 3.735, 3.724, 3.860, 3.405, 3.493, 3.586, 3.545, 3.443, 3.640))
>
> pca_morpho <- princomp(df, cor = TRUE)
>
> summary(pca_morpho)
>
> Importance of components:
> Comp.1 Comp.2 Comp.3 Comp.4
> Standard deviation 1.604107 0.8827323 0.7061206 0.3860275
> Proportion of Variance 0.643290 0.1948041 0.1246516 0.0372543
> Cumulative Proportion 0.643290 0.8380941 0.9627457 1.0000000
>
> Loadings:
> Comp.1 Comp.2 Comp.3 Comp.4
> weight -0.371 0.907 -0.201
> interoc -0.486 -0.227 -0.840
> cwidth -0.537 -0.349 0.466 -0.611
> clength -0.582 0.278 0.761
>
> Comp.1 Comp.2 Comp.3 Comp.4
> SS loadings 1.00 1.00 1.00 1.00
> Proportion Var 0.25 0.25 0.25 0.25
> Cumulative Var 0.25 0.50 0.75 1.00
>
> Any guidance will be greatly appreciated!
>
> Salvatore A. Sidoti
> PhD Student
> The Ohio State University
> Behavioral Ecology
>
More information about the R-help
mailing list