[R] How to perform clustering without removing rows where NA is present in R

Gundala Viswanath gundalav at gmail.com
Sat Dec 7 15:28:46 CET 2013


I have a data which contain some NA value in their elements.
What I want to do is to **perform clustering without removing rows**
where the NA is present.

I understand that `gower` distance measure in `daisy` allow such situation.
But why my code below doesn't work?

__BEGIN__
    # plot heat map with dendogram together.

    library("gplots")
    library("cluster")


    # Arbitrarily assigning NA to some elements
    mtcars[2,2] <- "NA"
    mtcars[6,7]  <- "NA"

     mydata <- mtcars

    hclustfunc <- function(x) hclust(x, method="complete")

    # Initially I wanted to use this but it didn't take NA
    #distfunc <- function(x) dist(x,method="euclidean")

    # Try using daisy GOWER function
    # which suppose to work with NA value
    distfunc <- function(x) daisy(x,metric="gower")

    d <- distfunc(mydata)
    fit <- hclustfunc(d)

    # Perform clustering heatmap
    heatmap.2(as.matrix(mydata),dendrogram="row",trace="none",
margin=c(8,9), hclust=hclustfunc,distfun=distfunc);
__END__

   The error message I got is this:

        Error in which(is.na) : argument to 'which' is not logical
    Calls: distfunc.g -> daisy
    In addition: Warning messages:
    1: In data.matrix(x) : NAs introduced by coercion
    2: In data.matrix(x) : NAs introduced by coercion
    3: In daisy(x, metric = "gower") :
      binary variable(s) 8, 9 treated as interval scaled
    Execution halted


At the end of the day, I'd like to perform hierarchical clustering
with the NA allowed data.

G.V.



More information about the R-help mailing list