[R] Statistical analysis of olive dataset
lists at dewey.myzen.co.uk
Sun Mar 13 10:23:51 CET 2016
Since you are using princomp (among other things) you might find the
biplot function useful on the output of princomp.
I have not studies your code in detail but you do seem to be doing
several things in multiple ways using functions from different sources.
I wonder whether it might be better to stick to fewer functions.
On 12/03/2016 17:39, Axel wrote:
> Hi to all the members of the list!
> I am a novice as regards to statistical
> analysis and the use of the R software, so I am experimenting with the dataset
> "olive" included in the package "tourr".
> This dataset contains the results of
> the determination of the fatty acids in 572 samples of olive oil from Italy
> (columns from 3 to 10) along with the area and the region of origin of the oil
> (respectively, column 1 and column 2).
> The main goal of my analysis is to
> determine which are the fatty acids that characterize the origin of an oil. As
> a secondary goal, I wolud like to insert the results of the chemical analysis
> of an oil that I analyzed (I am a Chemistry student) in order to determine its
> region of production. I do not know if this last thing is possibile.
> I am
> using R 3.2.4 on MacOS X El Capitan with the packages "tourr" and "psych"
> Here are the commands I have used up to now:
> olivenum <- olive[,c(3:
> mean <- colMeans(olivenum)
> sd <- sapply(olivenum,sd)
> R <- cor(olivenum)
> # Since the first three
> autovalues are greater than 1, these are the main components (column 1, 2 and
> 3). But I can determine them also using a scree diagram as following. Right?
> autoval <- eigen(R)$values
> autovec <- eigen(R)$vectors
> pvarsp <- autoval/ncol
> plot(autoval,type="b",main="Scree diagram",xlab="Number of
> eigen (R)$vectors[,
> olive.scale <- scale(olivenum,T,T)
> points <- olive.scale%*%autovec[,1:3]
> #Since I selected three main components (three columns), how should I plot the
> dispersion graph? I do not think that what I have done is right:
> main="Dispersion graph",xlab="Component 1",ylab="Component 2")
> #With the following command I obtain a summary of the
> importance of components. For example, the variance of component 1 is about
> 0,465, of component 2 is 0,220 and of component 3 is 0,127 with a cumulative
> variance of 0,812. This means that the values in the first three columns of the
> matrix "olivenum" mostly characterize the differences between the observations.
> determined that three components can explain a great part of variability but I
> don't know which are these components. How should I continue?
> Thank you for
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help