[R] NA microarray for kmeans clustering

Gaurav Dilip Moghe moghegau at msu.edu
Fri Aug 29 18:24:12 CEST 2008


Hello, 

I'm a graduate student in Genetics, who has just started working with R. I 
have been trying to do a k-means clustering of an expression data 
compilation, which has lots of NA values in it. As suggested in a couple of 
earlier posts, I tried using na.omit() and the MICE imputation algorithm to 
take care of the NA, but they dont seem to work that well. na.omit() deletes 
the entries, which affects the final results considerably, and so I am wary 
about using it. 

I am not sure whether I have been using MICE properly. Here is an example of 
the data and my commands 

> y<-read.table("test.txt",header=FALSE,skip=1,row.names=1)

        V2    V3    V4    V5    V6    V7    V8    V9   V10  V11   V12   V13
gene1  0.14  0.07 -0.58 -0.56 -0.25 -0.17  1.02  0.98  0.18 0.28  0.23  0.37
gene2    NA    NA    NA    NA    NA    NA    NA    NA    NA   NA    NA    NA
gene3  0.00  0.28 -0.01  0.29  0.14    NA  0.23    NA  0.08 0.00 -0.47 -0.57
gene4 -0.58 -1.22 -0.43 -0.23    NA -0.36  0.30  0.28  0.30 0.41  0.33 -0.08
gene5 -1.51 -1.36 -1.64 -1.89 -1.32 -0.38 -0.14 -0.32  0.39 0.58  0.19 -0.40
gene6 -0.50 -0.60 -0.42  0.41  0.32    NA    NA    NA -0.69 0.29  0.12  0.11 

> md.pattern(y)

 V2 V3 V4 V5 V10 V11 V12 V13 V14 V15 V16 V6 V8 V7 V9
2  1  1  1  1   1   1   1   1   1   1   1  1  1  1  1  0
1  1  1  1  1   1   1   1   1   1   1   1  0  1  1  1  1
1  1  1  1  1   1   1   1   1   1   1   1  1  1  0  0  2
1  1  1  1  1   1   1   1   1   1   1   1  1  0  0  0  3
1  0  0  0  0   0   0   0   0   0   0   0  0  0  0  0 15
  1  1  1  1   1   1   1   1   1   1   1  2  2  3  3 21 

> imp <-mice(y)

The message I get is: 

iter imp variable
 1   1  V2Error in solve.default(t(xobs) %*% xobs) :
       system is computationally singular: reciprocal condition number = 
1.0438e-19
> imp
Error: object "imp" not found 

I also tried using different methods as mentioned in the manual, but I get 
the same error everytime. Any suggestions on what could be wrong? And what 
needs to be done? I'd prefer to use MICE, but if there are any better 
methods, please let me know. 


Thanks,
Gaurav



More information about the R-help mailing list