[R] Statistical analysis of olive dataset

Sun Mar 13 16:24:59 CET 2016

On 3/12/2016 12:39 PM, Axel wrote:
> The main goal of my analysis is to
> determine which are the fatty acids that characterize the origin of an oil. As
> a secondary goal, I wolud like to insert the results of the chemical analysis
> of an oil that I analyzed (I am a Chemistry student) in order to determine its
> region of production. I do not know if this last thing is possibile.

There are already plenty of tools for this; don't bother trying to 
re-invent an already well-working wheel.

* PCA + a biplot will give you a good overview.  With groups, I 
recommend ggbiplot, with data ellipses for the groups.
This shows clear separation along PC1

data(olive, package="tourr")
library(ggbiplot)
olivenum <- olive[,c(3:10)]

olive.pca <- prcomp(olivenum, scale.=TRUE)
summary(olive.pca)

# region should be a factor (area has 9 levels, maybe too confusing)
olive$region <- factor(olive$region, labels=c("North", "Sardinia", "South"))

ggbiplot(olive.pca, obs.scale = 1, var.scale = 1,
          groups = olive$region, ellipse = TRUE, varname.size=4,
          circle = TRUE) +
          theme_bw() +
          theme(legend.direction = 'horizontal',
                legend.position = 'top')

* Discrimination among regions by chemical composition:
A canonical discriminant analysis will show you this in
a low-rank view.  The biggest difference is between the North
vs. the other 2.

# MLM
olive.mlm <- lm(as.matrix(olive[,c(3:10)]) ~ olive$region, data=olive)

# Canonical discriminant analysis

# (need devel. version for ellipses)
# install.packages("candisc", repos="http://R-Forge.R-project.org")
library(candisc)
olive.can <- candisc(olive.mlm)
olive.can
plot(olive.can, ellipse=TRUE)

* You can probably use the predict() method for MASS::lda() to predict
the class for new samples.

hope this helps,
-Michael