[R] Statistical analysis of olive dataset
friendly at yorku.ca
Sun Mar 13 16:24:59 CET 2016
On 3/12/2016 12:39 PM, Axel wrote:
> The main goal of my analysis is to
> determine which are the fatty acids that characterize the origin of an oil. As
> a secondary goal, I wolud like to insert the results of the chemical analysis
> of an oil that I analyzed (I am a Chemistry student) in order to determine its
> region of production. I do not know if this last thing is possibile.
There are already plenty of tools for this; don't bother trying to
re-invent an already well-working wheel.
* PCA + a biplot will give you a good overview. With groups, I
recommend ggbiplot, with data ellipses for the groups.
This shows clear separation along PC1
olivenum <- olive[,c(3:10)]
olive.pca <- prcomp(olivenum, scale.=TRUE)
# region should be a factor (area has 9 levels, maybe too confusing)
olive$region <- factor(olive$region, labels=c("North", "Sardinia", "South"))
ggbiplot(olive.pca, obs.scale = 1, var.scale = 1,
groups = olive$region, ellipse = TRUE, varname.size=4,
circle = TRUE) +
theme(legend.direction = 'horizontal',
legend.position = 'top')
* Discrimination among regions by chemical composition:
A canonical discriminant analysis will show you this in
a low-rank view. The biggest difference is between the North
vs. the other 2.
olive.mlm <- lm(as.matrix(olive[,c(3:10)]) ~ olive$region, data=olive)
# Canonical discriminant analysis
# (need devel. version for ellipses)
# install.packages("candisc", repos="http://R-Forge.R-project.org")
olive.can <- candisc(olive.mlm)
* You can probably use the predict() method for MASS::lda() to predict
the class for new samples.
hope this helps,
More information about the R-help