[R] Ward's Clustering Doubts

Rodrigo Aluizio r.aluizio at gmail.com
Mon Sep 15 15:43:21 CEST 2008


Well, once again, thank you so much Mark.
My original Ward's cluster, not transformed (which one wasn't euclidean) is
simply identical to the one "euclidefied" with Lingoes function (ape4
package).

Regards, Rodrigo.

--------------------------------------------------
From: "Mark Difford" <mark_difford em yahoo.co.uk>
Sent: Monday, September 15, 2008 8:09 AM
To: <r-help em r-project.org>
Subject: Re: [R] Ward's Clustering Doubts

>
> Hi Rodrigo,
>
> [apropos of Ward's method]
>
>>> ... we saw something like "You must use it with Euclidean Distance..."
>
> Strictly speaking this is probably correct, as Ward's method does an
> analysis of variance type of decomposition and so doesn't really make much
> sense  (I think) unless Euclidean distance (i.e. least-squares) is used.
>
> However, there may be ways around this. First, because a distance metric
> is
> non-Euclidean does not mean that it is always non-Euclidean. You can test
> this using ?is.euclid in package ade4. You can also test your matrix by
> doing a principal co-ordinate analysis; then look for negative
> eigenvalues.
> If none are found, the matrix is Euclidean and it should be OK to use
> Ward's
> method on that data set.
>
> Probably a better approach is to make your distance matrix Euclidean.
> There
> are several functions in ade4 that will do this. The idea then is to
> present/compare the two solutions: the first using the uncorrected,
> non-Euclidean distance matrix, the second using the corrected version. You
> could use procrustes/co-inertia analysis to compare the two in an
> intermediate step.
>
> Regards, Mark.
>
>
> Rodrigo Aluizio wrote:
>>
>> Hi Everybody,
>> Now I have a doubt that is more statistical than R's technical. I’m
>> working with ecology of recent Foraminifera.
>>
>> At the lab we used to perform cluster analysis using 1-Pearson’s R and
>> Wards method (we already saw it in bibliography of the area) which
>> renders
>> good results with our biological data. Recently, using “R” Software
>> (vegan
>> and Cluster packages) which allows the combination of any kind of
>> distances matrix with any clustering method, we tried to used Bray Curtis
>> + Wards (which seem to be more appropriate to a matrix with a lot of
>> zeros) and it renders a better result. Furthermore, the results agree
>> with
>> our hypothesis and with the results we have got with the Distance-based
>> Redundancy Analysis - dbRDA or CAP. It means, the analysis (Q-mode)
>> clusters the stations according to the main physical, sedimentary and
>> biological characteristics of the study area.
>>
>> We received some critical comments noticing that Wards Method accepts
>> Euclidean Distance only. So, we made the analysis again using Euclidean
>> Distance but we don’t get the better results we had using 1-Pearson’s R +
>> Wards or Bray Curtis + Wards (actually any other distance + method
>> combination rendered better results). Trying to find answers in the
>> specialized literature we just got little more confused because in any
>> moment we saw something like "You must use it with Euclidean Distance"
>> and
>> like I said above we already saw in some articles from respected
>> journals,
>> other kind of distance associated with the Ward's Clustering method.
>>
>> Is it wrong or is it “non sense” to do the analysis in the way we were
>> doing?
>>
>> The results with Wards combined with 1-Pearson’s R or Bray Curtis fit
>> better with our hypothesis and have excellent agglomerative coefficients
>> ,
>> but we don’t want to make inappropriate statistical procedures. I'm
>> starting to realize how powerful R is, but it doesn't justify doing
[[elided Yahoo spam]]
>>
>> Thank you in advance.
>>
>> Rodrigo.
>>
>> [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help em r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> -- 
> View this message in context:
> http://www.nabble.com/Ward%27s-Clustering-Doubts-tp19486028p19490991.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help em r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list