[R] prcomp - surprising structure
hnorpois at gmail.com
Thu Oct 3 11:41:16 CEST 2013
I did a pca with over 200000 snps for 340 observations (ids). If I plot the
eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot
(rotation[,2]) I see a strange "column" in my data (see attachment). I
suggest it is an artefact (but of what?).
I used prcomp this way: prcomp (mat), where mat is a matrix with the column
means already substracted followed by a normalisation procedure (see below
for details). Is that okay? Or does prcomp repeat substraction steps?
Originally my approach was driven by the idea to compute a covariation
matrix followed by the use of eigen, but the covariation matrix was to huge
to handle. So I switched to prcomp.
As I guess that the "columns" in my plots reflect some artefact production
I hope to get some help. For the case that my use of prcomp was not okay,
could you please give me instructions how to use it - including with the
normalisation procedure that I need to include before doing a pca.
# mat: matrix with genotypes coded as 0,1 and 2 (columns); IDs
(observations) as rows.
prcomp.snp <- function (mat)
m <- ncol (mat)
n <- nrow (mat)
snp.namen <- colnames (mat)
for (i in 1:m)
# snps in columns
ui <- mat[,i]
n <- length (which (!is.na(ui)))
# see methods Price et al. as correction
pi <- (1+ sum(ui, na.rm=TRUE))/(2+2*n)
# substract mean
ui <- ui - mean (ui, na.rm=TRUE)
# NAs set to zero
ui[is.na(ui)] <- 0
# normalisation of the genotype for each ID
important normalisation step
ui <- ui/ (sqrt (pi*(1-pi)))
# fill matrix with ui
mat[,i] <- ui
mat <- prcomp (mat)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 17486 bytes
Desc: not available
More information about the R-help