[R] cluster.stats

Laura Poggio laura.poggio at gmail.com
Sat Jun 14 23:04:43 CEST 2008


Thank you very much for all the info and support.
Now I managed to make it working on a small subset of the original data set. 
I think that the first error message I got (Error in as.dist(dmat[clustering == i, clustering == i]) :  (subscript) logical subscript too long)
is generated when the 2 objects required by cluster.stats do not have the same length.

Thanks! 
Laura

------- Original message -------
Da: Christian Hennig  <chrish at stats.ucl.ac.uk>
Inviato: 14.6.'08,  20:46

> Dear Laura,
> 
> I have R 2.6.0. I tried dist on a vector of length 200,000 and it told me 
> that it is too long. Theoretically, if you have 260,000 observations, the 
> length of the dist object should be 260,000*259,999/2, which is too large 
> for our computers, I guess. Which means that unfortunately cluster.stats 
> won't work for such a large data set, because it needs the full casewise 
> dissimilarity information.
> 
> I don't understand how you managed to produce a dist object of length 
> of only 130,000 out of your data, but it certainly doesn't give all 
> pairwise distance information for 260,000 points and therefore cannot be 
> used in cluster.stats with a clustering vector of length 260,000 or so.
> 
> Sorry,
> Christian
> 
> On Sat, 14 Jun 2008, Laura Poggio wrote:
> 
> > Thank. See below.
> >
> > Laura
> >
> > 2008/6/14 Christian Hennig <chrish at stats.ucl.ac.uk>:
> >
> >> What does str(ddata) give?
> >
> >
> > Class 'dist'  atomic [1:130816]   69.2 117.1 145.6 179.9 195.6 ...
> >
> >
> >>
> >> dcent doesn't make sense as input for cluster.stats, because you need a
> >> dissimilarity matrix between all objects.
> >>
> >
> > Yes I know ... I simply try to see if something was changing with different
> > structure of data
> >
> >
> >
> >>
> >> Christian
> >>
> >> On Sat, 14 Jun 2008, Laura Poggio wrote:
> >>
> >>  I am sorry I did not provide enough information.
> >>> I am not using img later, but data that is data.frame.
> >>> I wrote that img is a "image" just to explain what kind of data is coming
> >>> from, but the object I am using is data and it is a data.frame (checked
> >>> many
> >>> times).
> >>>
> >>> I am not using as.dist, but dist in order to calculate the distance matrix
> >>> among the data I have. Then the whole code I am using is:
> >>>
> >>> data <- <- as(img, "data.frame")[1:1]    #(where img is an image 256x256
> >>> px)
> >>> kl <- kmeans(data, 5)
> >>> library(fpc)
> >>> ddata <- dist(data)
> >>> dcent <- dist(kl$centers)
> >>>
> >>> cluster.stats(ddata, kl$cluster)
> >>> cluster.stats(dcent, kl$cluster)
> >>>
> >>> In both cases I got the same error:
> >>> Error in as.dist(dmat[clustering == i, clustering == i]) :  (subscript)
> >>> logical subscript too long
> >>>
> >>> Below the structure of the different objects is detailed below:
> >>> data is "'data.frame':   262144 obs. of  1 variable"
> >>> kl$centers is "num [1:5, 1]"
> >>> kl$cluster is "Named int [1:262144]"
> >>>
> >>> I hope it is more informative. I am sorry but I did not find any
> >>> explanation
> >>> for the error message I am getting.
> >>>
> >>> Thank you very much in advance
> >>>
> >>> Laura
> >>>
> >>>
> >>>
> >>> 2008/6/14 Christian Hennig <chrish at stats.ucl.ac.uk>:
> >>>
> >>>  The given information is not enough to tell you what's going on. as.dist
> >>>> doesn't appear in the given code and it's not clear to me what kind of
> >>>> object img is ("a small image" doesn't tell me what R makes of it).
> >>>> Also, try to read the help pages first and find out whether img is of the
> >>>> format that is required by the functions. And check (using str for
> >>>> example)
> >>>> whether "data" is what you expect it to be.
> >>>>
> >>>> Christian
> >>>>
> >>>>
> >>>> On Sat, 14 Jun 2008, Laura Poggio wrote:
> >>>>
> >>>>  Thank you very much for your answer.
> >>>>
> >>>>> I tried to run the function on my data and now I am getting this message
> >>>>> of
> >>>>> error
> >>>>> Error in as.dist(dmat[clustering == i, clustering == i]) :  (subscript)
> >>>>> logical subscript too long
> >>>>>
> >>>>> Below the code I am using (version2.7.0 of R with all packages updated):
> >>>>>
> >>>>> data <- <- as(img, "data.frame")[1:1]    #(where img is a small image
> >>>>> 256
> >>>>> px
> >>>>> x 256 px)
> >>>>> kl <- kmeans(data, 5)
> >>>>> library(fpc)
> >>>>> cluster.stats(data, kl$cluster)
> >>>>>
> >>>>> Thank you for any hints on the reasons and meaning of the error!
> >>>>>
> >>>>> Laura
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2008/6/13 Christian Hennig <chrish at stats.ucl.ac.uk>:
> >>>>>
> >>>>>  Dear Laura,
> >>>>>
> >>>>>>
> >>>>>>  Dear list,
> >>>>>>
> >>>>>>  I just tried to use the function cluster.stat in the package fpc.
> >>>>>>> I just have a couple of questions about the syntax:
> >>>>>>>
> >>>>>>> cluster.stats(d,clustering,alt.clustering=NULL,
> >>>>>>> silhouette=TRUE,G2=FALSE,G3=FALSE)
> >>>>>>>
> >>>>>>> 1) the distance object (d) is an object obtained by the function
> >>>>>>> dist()
> >>>>>>> on
> >>>>>>> my own original matrix?
> >>>>>>>
> >>>>>>>
> >>>>>>>  d is allowed to be an object of class dist or a dissimilarity matrix.
> >>>>>> The answer to your question depends on what your "original matrix" is.
> >>>>>> If
> >>>>>> it is something on which you can compute a distance by dist(), you're
> >>>>>> right,
> >>>>>> at least if dist() delivers the distance you are interested in.
> >>>>>>
> >>>>>>  2) clustering is the clusters vector as result of one of the many
> >>>>>>
> >>>>>>  clustering
> >>>>>>> methods?
> >>>>>>>
> >>>>>>>
> >>>>>>>  The help page tells you what clustering can be. So it could be the
> >>>>>> clustering/partition vector of a clustering method or it could be
> >>>>>> something
> >>>>>> else. Note that cluster.stats doesn't depend on any particular
> >>>>>> clustering
> >>>>>> method. It computes the statistics regardless of where the clustering
> >>>>>> vector
> >>>>>> comes from.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Christian
> >>>>>>
> >>>>>>
> >>>>>>  Thank you very much in advance and sorry for such basic question, but
> >>>>>> I
> >>>>>>
> >>>>>>> did
> >>>>>>> not manage to clarify my mind.
> >>>>>>>
> >>>>>>> Laura
> >>>>>>>
> >>>>>>>      [[alternative HTML version deleted]]
> >>>>>>>
> >>>>>>> ______________________________________________
> >>>>>>> R-help at r-project.org mailing list
> >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>> PLEASE do read the posting guide
> >>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>>
> >>>>>>>
> >>>>>>>  *** --- ***
> >>>>>>>
> >>>>>> Christian Hennig
> >>>>>> University College London, Department of Statistical Science
> >>>>>> Gower St., London WC1E 6BT, phone +44 207 679 1698
> >>>>>> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>  *** --- ***
> >>>> Christian Hennig
> >>>> University College London, Department of Statistical Science
> >>>> Gower St., London WC1E 6BT, phone +44 207 679 1698
> >>>> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>>
> >>>>
> >>>
> >> *** --- ***
> >> Christian Hennig
> >> University College London, Department of Statistical Science
> >> Gower St., London WC1E 6BT, phone +44 207 679 1698
> >> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>
> >
> > ?[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list