weithed clustering (was: Re: [R] problems with a large data set)

mlennert@ulb.ac.be mlennert at ulb.ac.be
Fri Apr 27 11:18:40 CEST 2001


kmeans and clara work great. Thank you for the tip.

I have another question: 

Is it possible to weight the observations in a cluster analysis ? I haven't 
found any mention of this in the kmeans of clara help texts.


Moritz Lennert
Chargé de recherche
IGEAT - ULB

tél: 32-2-650.65.16
fax: 32-2-650.50.92
email: mlennert at ulb.ac.be


> On Wed, 25 Apr 2001, Moritz Lennert wrote:
> 
> > Hello,
> >
> > I have trouble with a data set that comprises 2136 lines of 20 columns.
> > I would like to do a hierarchical clustering and I tried the following:
> >
> > ages.hclust <- hclust(dist(ages, method="euclidean"), "ward")
> >
> > but I get the following error message:
> >
> > Error: cannot allocate vector of size 17797 Kb
> >
> > When I try to do the dist() alone first without the hclust(), I get the
> > same type of message.
> >
> > Then I tried with the RPgSQL packages by typing
> >
> > >db.connect(dbname="space")
> > Connected to database "space" on "localhost"
> > > bind.db.proxy("ages")
> > > ages.hclust <- hclust(dist(ages, method="euclidean"), "ward")
> 
> That does not help. You need to retrieve the data to use it!
> 
> > This time I get:
> >
> > Error in dist(ages, method = "euclidean") :
> >         NA/NaN/Inf in foreign function call (arg 1)
> > In addition: Warning message:
> > NAs introduced by coercion
> >
> >
> > I've checked, and I can't find any missing values of something similar.
> > Could someone tell me if I'm doing something wrong, or wether this is
> > just too much data for R ?
> 
> This may be too much data for your computer, but not for R: I've
> just done this in a few seconds.  I suggest that you need more memory
> (real or virtual): on my simulation it used about 80Mb.
> 
> I should say that doing agglomerative hierarchical cluster on thousands of
> points makes little sense: it is a not a good way to find large clusters:
> try a partitioning method like kmeans or clara (in package cluster).
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list