[R] Help with PCA data file prep and R code
sastinson at ucdavis.edu
Wed Sep 21 07:28:02 CEST 2016
I'm new to R and would appreciate some expert advice on prepping files for,
and running, PCA...
My data set consists of aquatic invertebrate and zooplankton count data and
physicochemical measurements from an ecotoxicology study. Four chemical
treatments were applied to mesocosm tanks, 4 replicates per treatment (16
tanks total), then data were collected weekly over a 3 month period.
I cleaned the data in excel by removing columns with all zero values, and
all rows with NA values.
All zooplankton values were volume normalized, then log normalized. All
other data was log normalized in excel prior to analysis in R. All vectorss
are numeric. I've attached the .csv file to this email rather that using
dput(dataframe). I hope that's acceptable.
My questions are:
1. Did I do the cleaning step appropriately? I know that there are ways to
run PCA's using data that contain NA values (pcaMethods), but wasn't able
to get the code to work...
(I understand that this isn't strictly an R question, but any help would be
2. Does my code look correct for the PCA and visualization (see below)?
Thanks in advance,
mesocleaned <- read.csv("MesoCleanedforPCA.9.16.16.csv")
meso.pca <- prcomp(mesocleaned,
center = TRUE,
scale. = TRUE)
# print method
#compute standard deviation of each principal component
std_dev <- meso.pca$sdev
pr_var <- std_dev^2
#check variance of first 10 components
#proportion of variance explained
prop_varex <- pr_var/sum(pr_var)
#The first principal component explains 12.7% of the variance
#The second explains 8.1%
#for visualization, make Treatment vector a factor instead of numeric
meso.treatment <- as.factor(mesocleaned[, 3])
#ggbiplot to visualize by Treatment group
print(ggbiplot(meso.pca, obs.scale = 1, var.scale = 1, groups =
meso.treatment, ellipse = TRUE, circle = TRUE))
g <- ggbiplot(meso.pca, obs.scale = 1, var.scale = 1,
groups = meso.treatment, ellipse = TRUE,
circle = TRUE)
g <- g + scale_color_brewer(name = deparse(substitute(Treatments)), palette
= 'Dark2') #must change meso.treatment to a factor for this to work
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
#plot each variables coefficients inside a unit circle to get insight on a
possible interpretation for PCs.
theta <- seq(0,2*pi,length.out = 100)
circle <- data.frame(x = cos(theta), y = sin(theta))
p <- ggplot(circle,aes(x,y)) + geom_path()
loadings <- data.frame(meso.pca$rotation,
.names = row.names(meso.pca$rotation))
p + geom_text(data=loadings,
mapping=aes(x = PC1, y = PC2, label = .names, colour =
labs(x = "PC1", y = "PC2")
More information about the R-help