[R] agnes clustering and NAs

Dario Strbenac D.Strbenac at garvan.org.au
Fri Jan 28 00:00:16 CET 2011


Hello,

Yes, that's right, it is a values matrix. Not a dissimilarity matrix.

i.e.

> str(iMatrix)
 num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:56] "-8100" "-7900" "-7700" "-7500" ...

For the snippet of checking for NAs, I get all TRUEs, so I have at least one NA in each column.

The part of the agnes documentation I was referring to is :

"In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric.  Missing values (NAs) are allowed."

So, I'm under the impression it handles NAs on its own ?

- Dario.

---- Original message ----
>Date: Thu, 27 Jan 2011 12:53:27 +0000
>From: Gavin Simpson <gavin.simpson at ucl.ac.uk>  
>Subject: Re: [R] agnes clustering and NAs  
>To: Uwe Ligges <ligges at statistik.tu-dortmund.de>
>Cc: D.Strbenac at garvan.org.au, r-help at r-project.org
>
>On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote:
>> 
>> On 27.01.2011 05:00, Dario Strbenac wrote:
>> > Hello,
>> >
>> > In the documentation for agnes in the package 'cluster', it says that NAs are allowed, and sure enough it works for a small example like :
>> >
>> >> m<- matrix(c(
>> > 1, 1, 1, 2,
>> > 1, NA, 1, 1,
>> > 1, 2, 2, 2), nrow = 3, byrow = TRUE)
>> >> agnes(m)
>> > Call:    agnes(x = m)
>> > Agglomerative coefficient:  0.1614168
>> > Order of objects:
>> > [1] 1 2 3
>> > Height (summary):
>> >     Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>> >    1.155   1.247   1.339   1.339   1.431   1.524
>> >
>> > Available components:
>> > [1] "order"  "height" "ac"     "merge"  "diss"   "call"   "method" "data"
>> >
>> > But I have a large matrix (23371 rows, 50 columns) with some NAs in it and it runs for about a minute, then gives an error :
>> >
>> >> agnes(iMatrix)
>> > Error in agnes(iMatrix) :
>> >    No clustering performed, NA-values in the dissimilarity matrix.
>> >
>> > I've also tried getting rid of rows with all NAs in them, and it still gave me the same error. Is this a bug in agnes() ? It doesn't seem to fulfil the claim made by its documentation.
>> 
>> 
>> I haven't looked in the file, but you need to get rid of all NA, or in 
>> other words, all rows that contain *any* NA values.
>
>If one believes the documentation, then that only applies to the case
>where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw
>data matrix or data frame.
>
>The only way the OP could have gotten that error with the call shown is
>if iMatrix were not a dissimilarity matrix inheriting from class "dist",
>so `NA`s should be allowed.
>
>My guess would be that the OP didn't get rid of all the `NA`s.
>
>Dario: what does:
>
>sapply(iMatrix, function(x) any(is.na(x)))
>
>or if iMatrix is a matrix:
>
>apply(iMatrix, 2, function(x) any(is.na(x)))
>
>say?
>
>G
>
>> Uwe Ligges
>> 
>> 
>> 
>> > The matrix I'm using can be obtained here :
>> > http://129.94.136.7/file_dump/dario/iMatrix.obj
>> >
>> > --------------------------------------
>> > Dario Strbenac
>> > Research Assistant
>> > Cancer Epigenetics
>> > Garvan Institute of Medical Research
>> > Darlinghurst NSW 2010
>> > Australia
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>-- 
>%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
>%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>


--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia



More information about the R-help mailing list