[R] Using kmeans given cluster centroids and data with NAs

Ales Ziberna ales.ziberna at guest.arnes.si
Wed Apr 6 11:52:09 CEST 2005


Hello!



I would suggest using some form of imputations, such as MICE package
(http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm) or similar (I
heard that this can be also done with aregImpute function in the Hmisc
package, although I have not tried it) to fill in the NA's. Then you can use
k-means or any technique you which, since now you have a complete
data-frame. However, for more reliable results, it is best to repeat
imputations and analysis several times.



I hope this helps!



Ales Ziberna


----- Original Message ----- 
From: <Sophie.Bestley at csiro.au>
To: <Tom.Mulholland at dpi.wa.gov.au>; <r-help at stat.math.ethz.ch>
Sent: Monday, April 04, 2005 5:23 AM
Subject: [R] Using kmeans given cluster centroids and data with NAs


> Hello Tom,
>
> Thanks for the reply.
>
> Unfortunately I do have many NAs in my data as not all vertical
> temperature profiles penetrated to the same depth level. In fact if I
> simply use na.omit my data matrix is reduced from 4977 to 480
> observations, so such a simple solution is not very helpful I'm afraid.
> Any other ideas?
>
> Cheers,
> SB
>
> -----Original Message-----
> From: Mulholland, Tom [mailto:Tom.Mulholland at dpi.wa.gov.au]
> Sent: Thursday, 31 March 2005 2:15 PM
> To: Bestley, Sophie (Marine, Hobart); r-help at stat.math.ethz.ch
> Subject: RE: [R] Using kmeans given cluster centroids and data with NAs
>
>
> Does ?na.omit help
>
> x <- kmeans(na.omit(data),centres)
>
> of course if you have too many NAs you need to be sure that their
> removal does not unduly influence the results.
>
> Although I am a bit confused as I thought that agnes did not allow NAs.
> I assume that you are running an alternative clustering method using the
> results of the first process as the starting point for the partitioning
> process and are thus using the same initial data.
>
> Tom
>
>> -----Original Message-----
>> From: Sophie.Bestley at csiro.au [mailto:Sophie.Bestley at csiro.au]
>> Sent: Thursday, 31 March 2005 11:33 AM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] Using kmeans given cluster centroids and data with NAs
>>
>>
>> Hello,
>>
>> I have used the functions agnes and cutree to cluster my data (4977
>> objects x 22 variables) into 8 clusters. I would like to refine the
>> solution using a k-means or similar algorithm, setting the initial
>> cluster centres as the group means from agnes. However my data matrix
>> has NA's in it and the function kmeans does not appear to accept this?
>>
>> > dim(centres)
>> [1]  8 22
>>
>> > dim(data)
>> [1] 4977   22
>>
>> > x <- kmeans(data,centres)
>> Error in kmeans(data, centres) : NA/NaN/Inf in foreign function call
>> (arg 1)
>>
>> I have looked extensively through the mail archives but cannot find
>> if/where someone has provided the answer.
>>
>> Thanks in advance,
>> SB
>>
>> Sophie Bestley
>> Pelagic Fisheries and Ecosystems
>> CSIRO Marine Research
>> GPO Box 1538
>> Hobart, Tasmania 7001
>> AUSTRALIA
>>
>> Phone: +61 3 6232 5048
>> Fax: +61 3 6232 5053
>> Email: sophie.bestley at csiro.au
>> Website: http://www.marine.csiro.au
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>




More information about the R-help mailing list