[R] Statistical analysis of olive dataset

Bert Gunter bgunter.4567 at gmail.com
Sun Mar 13 05:49:46 CET 2016


Inline.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Mar 12, 2016 at 9:39 AM, Axel <axeldibert at alice.it> wrote:
> Hi to all the members of the list!
>
> I am a novice as regards to statistical
> analysis and the use of the R software, so I am experimenting with the dataset
> "olive" included in the package "tourr".

Stop experimenting and spend time with an R tutorial or two? There are
many good ones on the Web. See also
https://www.rstudio.com/online-learning/#R  for some recommendations.




> This dataset contains the results of
> the determination of the fatty acids in 572 samples of olive oil from Italy
> (columns from 3 to 10) along with the area and the region of origin of the oil
> (respectively, column 1 and column 2).
>
> The main goal of my analysis is to
> determine which are the fatty acids that characterize the origin of an oil. As
> a secondary goal, I wolud like to insert the results of the chemical analysis
> of an oil that I analyzed (I am a Chemistry student) in order to determine its
> region of production. I do not know if this last thing is possibile.
>
> I am
> using R 3.2.4 on MacOS X El Capitan with the packages "tourr" and "psych"
> loaded.
> Here are the commands I have used up to now:
>
> olivenum <- olive[,c(3:
> 10)]
> mean <- colMeans(olivenum)
> sd <- sapply(olivenum,sd)
> describeBy(olivenum,
> olive[2])
> pairs(olivenum)
> R <- cor(olivenum)
> eigen(R)
> # Since the first three
> autovalues are greater than 1, these are the main components (column 1, 2 and
> 3). But I can determine them also using a scree diagram as following. Right?
>
> autoval <- eigen(R)$values
> autovec <- eigen(R)$vectors
> pvarsp <- autoval/ncol
> (olivenum)
> plot(autoval,type="b",main="Scree diagram",xlab="Number of
> components",ylab="Autovalues")
> abline(h=1,lwd=3,col="red")
>
> eigen (R)$vectors[,
> 1:3]
> olive.scale <- scale(olivenum,T,T)
> points <- olive.scale%*%autovec[,1:3]
>
>
> #Since I selected three main components (three columns), how should I plot the
> dispersion graph? I do not think that what I have done is right:
> plot(points,
> main="Dispersion graph",xlab="Component 1",ylab="Component 2")
> princomp
> (olivenum,cor=T)
> #With the following command I obtain a summary of the
> importance of components. For example, the variance of component 1 is about
> 0,465, of component 2 is 0,220 and of component 3 is 0,127 with a cumulative
> variance of 0,812. This means that the values in the first three columns of the
> matrix "olivenum" mostly characterize the differences between the observations.
> Right?
> summary(princomp(olivenum,cor=T))
> screeplot(princomp(olivenum,cor=T))
>
> plot(princomp(olivenum,cor=T)$scores,rownames(olivenum))
> abline(h=0,v=0)
>
> I
> determined that three components can explain a great part of variability but I
> don't know which are these components. How should I continue?
>
> Thank you for
>
> attention,
> Axel
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list