[R] Statistical analysis of olive dataset
Jim Lemon
drjimlemon at gmail.com
Sun Mar 13 08:22:01 CET 2016
Hi Axel,
It seems to me that cluster analysis could be what you are seeking.
Identify the clusters of different combinations of fatty acids in the
oils. Do they correspond to location? If so, is there a method to
predict the cluster membership of a new set of measurements? Have a
look at the cluster package, which you should have.
Jim
On Sun, Mar 13, 2016 at 4:39 AM, Axel <axeldibert at alice.it> wrote:
> Hi to all the members of the list!
>
> I am a novice as regards to statistical
> analysis and the use of the R software, so I am experimenting with the dataset
> "olive" included in the package "tourr".
> This dataset contains the results of
> the determination of the fatty acids in 572 samples of olive oil from Italy
> (columns from 3 to 10) along with the area and the region of origin of the oil
> (respectively, column 1 and column 2).
>
> The main goal of my analysis is to
> determine which are the fatty acids that characterize the origin of an oil. As
> a secondary goal, I wolud like to insert the results of the chemical analysis
> of an oil that I analyzed (I am a Chemistry student) in order to determine its
> region of production. I do not know if this last thing is possibile.
>
> I am
> using R 3.2.4 on MacOS X El Capitan with the packages "tourr" and "psych"
> loaded.
> Here are the commands I have used up to now:
>
> olivenum <- olive[,c(3:
> 10)]
> mean <- colMeans(olivenum)
> sd <- sapply(olivenum,sd)
> describeBy(olivenum,
> olive[2])
> pairs(olivenum)
> R <- cor(olivenum)
> eigen(R)
> # Since the first three
> autovalues are greater than 1, these are the main components (column 1, 2 and
> 3). But I can determine them also using a scree diagram as following. Right?
>
> autoval <- eigen(R)$values
> autovec <- eigen(R)$vectors
> pvarsp <- autoval/ncol
> (olivenum)
> plot(autoval,type="b",main="Scree diagram",xlab="Number of
> components",ylab="Autovalues")
> abline(h=1,lwd=3,col="red")
>
> eigen (R)$vectors[,
> 1:3]
> olive.scale <- scale(olivenum,T,T)
> points <- olive.scale%*%autovec[,1:3]
>
>
> #Since I selected three main components (three columns), how should I plot the
> dispersion graph? I do not think that what I have done is right:
> plot(points,
> main="Dispersion graph",xlab="Component 1",ylab="Component 2")
> princomp
> (olivenum,cor=T)
> #With the following command I obtain a summary of the
> importance of components. For example, the variance of component 1 is about
> 0,465, of component 2 is 0,220 and of component 3 is 0,127 with a cumulative
> variance of 0,812. This means that the values in the first three columns of the
> matrix "olivenum" mostly characterize the differences between the observations.
> Right?
> summary(princomp(olivenum,cor=T))
> screeplot(princomp(olivenum,cor=T))
>
> plot(princomp(olivenum,cor=T)$scores,rownames(olivenum))
> abline(h=0,v=0)
>
> I
> determined that three components can explain a great part of variability but I
> don't know which are these components. How should I continue?
>
> Thank you for
>
> attention,
> Axel
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list