[R] data(eurodist) and PCA ??

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 13 08:51:40 CEST 2004


On Wed, 13 Oct 2004, Dan Bolser wrote:

> If I perform PCA on the 'eurodist' data, should I get an accurate
> geographic layout of the cities with biplot?

No, but a good approximation.

> (barring inversions, i.e. their is no way to define north.. but you get
> the idea...)
> 
> I have a complex distance matrix, and I am thinking about how to cluster
> it and how to visualize the quality of the resulting clusters. 

Using PCA and plotting the first two components is classical
multi-dimensional scaling, as implemented by cmdscale().  Look up MDS
somewhere (e.g. in MASS).  It is exact if the distances are Euclidean in
2D.  However, eurodist gives road distances on the surface of sphere.

Classic examples for the illustration of MDS are departements of France 
based on proximity data and cities in the UK based on road distances.

There is a minor point as to what you mean `with biplot', covered in 
MASS4: it depends on the exact definition of biplot (and biplot.princomp 
has a parameter -- this is not by default done in S-PLUS in a way that 
makes your statement correct).

> If I could 'see' the clusters in space I could understand how / what the
> cluster algorithms were doing. 

A standard topic for MDS: see e.g. two of my books (MASS and my Pattern 
Recognition and Neural Networks) for extensive examples.

> Can I use PCA over the distance matrix to to do that?
> 
> Sorry for the dumb questions.

Please do some homework: suggestions above and in the posting guide.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list