[R] PCA with NA

Thibaut Jombart jombart at biomserv.univ-lyon1.fr
Fri Nov 23 17:26:38 CET 2007


Birgit Lemcke wrote:

>Dear all,
>(Mac OS X 10.4.11, R 2.6.0)
>I have a quantitative dataset with a lot of Na´s in it. So many, that  
>it is not possible to delete all rows with NA´s and also not  
>possible, to delete all variables with NA´s.
>Is there a function for a principal component analysis, that can deal  
>with so many NA´s.
>
>Thanks in advance
>
>Birgit
>
>
>Birgit Lemcke
>Institut für Systematische Botanik
>Zollikerstrasse 107
>CH-8008 Zürich
>Switzerland
>Ph: +41 (0)44 634 8351
>birgit.lemcke at systbot.uzh.ch
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>
>  
>
Hi,

in centred PCA, missing data should be replaced by the mean of available 
data.
Let X be your analyzed matrix (variables in columns).

##
X = matrix(runif(300),ncol=10)
idx = sample(1:nrow(X),5)
X[idx,] = NA
sum(is.na(X))
[1] 95

library(ade4)
dudi.pca(X,center=TRUE,scale=FALSE)
Erreur dans dudi.pca(X, center = TRUE, scale = FALSE) : na entries in table
##

Now we replace missing values :

##
 f1 <- function(vec) {
        m <- mean(vec, na.rm = TRUE)
        vec[is.na(vec)] <- m
        return(vec)
    }

Y = apply(X,2,f1)

pcaY = dudi.pca(Y,center=TRUE,scale=FALSE,nf=2,scannf=FALSE)

s.label(pcaY$li)
sunflowerplot(pcaY$li[idx,1:2], add=TRUE)
##

All missing values are placed at the non-informative point, i.e. at the 
origin.

Regards,

Thibaut.

-- 
######################################
Thibaut JOMBART
CNRS UMR 5558 - Laboratoire de Biométrie et Biologie Evolutive
Universite Lyon 1
43 bd du 11 novembre 1918
69622 Villeurbanne Cedex
Tél. : 04.72.43.29.35
Fax : 04.72.43.13.88
jombart at biomserv.univ-lyon1.fr
http://lbbe.univ-lyon1.fr/-Jombart-Thibaut-.html?lang=en
http://pbil.univ-lyon1.fr/software/adegenet/



More information about the R-help mailing list