[R] correlation as distance/dissimilarity

Martin Maechler maechler at stat.math.ethz.ch
Wed Sep 14 17:49:10 CEST 2005


I've been asked (privately)

>>>>> "CarlosJ" == jaramilloc  <jaramilloc at si.edu>
>>>>>     on Wed, 14 Sep 2005 09:40:22 -0400 writes:

     ..........

    CarlosJ> In Kaufman & Rousseeuw 2000 book on Cluster Analysis, it says that 
    CarlosJ> Daisy can compute Pearson correlation between variables and then 
    CarlosJ> transform these to dissimilarities.  

I don't think it does say this.  But it does talk about doing it
"your self", e.g., on pages 17--19.

    CarlosJ> Has this capability being 
    CarlosJ> implemented in the Cluster package for R?  It seems that is not 
    CarlosJ> there.  How could I do that using R?
    CarlosJ> I would appreciate your help.

It has never been explicitly in R, because in the past 'everyone'
has thought this was obvious and trivial.  The "past" here was
when S was used by statisticians, mathematicians or engineers...

Anyway, here is an example on how to do this.

> dd <- as.dist((1 - cor(USJudgeRatings))/2)
> plot(hclust(dd))
> round(1000 * dd)
     CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS
INTG  567                                                  
DMNR  577   18                                             
DILG  494   64   82                                        
CFMG  432   93   93   21                                   
DECI  457   99   98   22    9                              
PREP  494   61   72   11   21   21                         
FAMI  513   66   79   21   32   29    5                    
ORAL  506   44   47   23   25   26    8    9               
WRIT  522   46   53   20   29   27    7    5    3          
PHYS  473  129  106   94   60   64   76   78   54   72     
RTEN  517   31   28   35   36   38   25   29    9   16   47

I'm going to add the example to the help page for 'dist' in R-2.2.0 

Martin Maechler




More information about the R-help mailing list