[R] How to perform clustering without removing rows where NA is present in R

Sarah Goslee sarah.goslee at gmail.com
Mon Dec 9 21:49:46 CET 2013


Though your second question, restating this, has already been
answered, it might be worth you taking another look at your code in
this one as well.

In particular note that NA and "NA" are NOT the same thing.

data(mtcars)
str(mtcars)

# from your code
mtcars[2,2] <- "NA"
mtcars[6,7]  <- "NA"

str(mtcars)

I'm pretty sure that's not what you want.

Thanks for providing a reproducible example: otherwise it would have
been impossible to catch this. If you run into unexpected errors, it's
always a good plan to start by using str() and similar functions to
check whether your data are as you intend.

Sarah


On Sat, Dec 7, 2013 at 9:28 AM, Gundala Viswanath <gundalav at gmail.com> wrote:
> I have a data which contain some NA value in their elements.
> What I want to do is to **perform clustering without removing rows**
> where the NA is present.
>
> I understand that `gower` distance measure in `daisy` allow such situation.
> But why my code below doesn't work?
>
> __BEGIN__
>     # plot heat map with dendogram together.
>
>     library("gplots")
>     library("cluster")
>
>
>     # Arbitrarily assigning NA to some elements
>     mtcars[2,2] <- "NA"
>     mtcars[6,7]  <- "NA"
>
>      mydata <- mtcars
>
>     hclustfunc <- function(x) hclust(x, method="complete")
>
>     # Initially I wanted to use this but it didn't take NA
>     #distfunc <- function(x) dist(x,method="euclidean")
>
>     # Try using daisy GOWER function
>     # which suppose to work with NA value
>     distfunc <- function(x) daisy(x,metric="gower")
>
>     d <- distfunc(mydata)
>     fit <- hclustfunc(d)
>
>     # Perform clustering heatmap
>     heatmap.2(as.matrix(mydata),dendrogram="row",trace="none",
> margin=c(8,9), hclust=hclustfunc,distfun=distfunc);
> __END__
>
>    The error message I got is this:
>
>         Error in which(is.na) : argument to 'which' is not logical
>     Calls: distfunc.g -> daisy
>     In addition: Warning messages:
>     1: In data.matrix(x) : NAs introduced by coercion
>     2: In data.matrix(x) : NAs introduced by coercion
>     3: In daisy(x, metric = "gower") :
>       binary variable(s) 8, 9 treated as interval scaled
>     Execution halted
>
>
> At the end of the day, I'd like to perform hierarchical clustering
> with the NA allowed data.
>
> G.V.
>


-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list