[R] clustering

Christian Hennig fm3a004 at math.uni-hamburg.de
Fri Jan 28 11:51:44 CET 2005


Hi,

EMclust in package mclust fits normal mixtures.
Note that if you split your data values into intervals, the resulting
distributions conditional on the intervals are not normals, but truncated
normals!
This is important if you try to check within group normality, unless you
have strongly separated clusters (which does not seem to be the case).

Christian


On Fri, 28 Jan 2005, WeiWei Shi wrote:

> Actually the problem I am trying to solve is to discretize a
> continuous variable (which is my response variable (dependent
> variable) in my project so that I can make a regression problem into a
> classification one. (There are many reasons for doing this.)
> 
> Since there is no class label for this variable (because this variable
> is my class variable :), the unsupervised approach can be applied
> here. However, checking the related papers shows there is little
> research (in my knowledge, and I haven't checked the MCC yet) in this
> field. Using qqnorm to check the normality and histogram indicates
> there might be two normal distributions.
> 
> My approach is splitting the values for this variable into 2 or 3
> intervals and check each interval's normality again. If some approach
> like clustering or the one Andy suggests works well, then I should get
> much better normality. I will try that tomorrow.
> 
> I am not sure if my idea works or not here, please be advised !
> 
> Thanks,
> 
> Ed
> 
> 
> On Thu, 27 Jan 2005 18:58:28 -0500, Liaw, Andy <andy_liaw at merck.com> wrote:
> > It depends a lot on what you know or don't know about the data, and what
> > problem you're trying to solve.
> > 
> > If you know for sure it's a mixture of gaussians, likelihood based
> > approaches might be better.  MASS (the book) has an example of fitting
> > univariate mixture of gaussians using various optimizers.  The code is even
> > in $R_HOME/library/MASS/scripts/ch16.R.
> > 
> > Andy
> > 
> > > From: WeiWei Shi
> > >
> > > Hi,
> > > thanks for reply. In fact, I tried both of them and I also tried the
> > > other method and I found all of them gave me different boundaries (to
> > > my real datasets). I am thinking about k-median but hoping to get more
> > > suggestions from all of you in this forum.
> > >
> > > Cheers,
> > >
> > > Ed
> > >
> > >
> > > On Thu, 27 Jan 2005 15:37:16 -0600, msck9 at mizzou.edu
> > > <msck9 at mizzou.edu> wrote:
> > > > The cluster analysis should be able to handle that. I think if you
> > > > know how many clusters you have, "kmeans" is ok, or the EM algorithm
> > > > can also do that.
> > > > On Thu, Jan 27, 2005 at 03:44:42PM -0500, WeiWei Shi wrote:
> > > > > Hi,
> > > > > I just get a question (sorry if it is a dumb one) and I "phase" my
> > > > > question in the following R codes:
> > > > >
> > > > > group1<-rnorm(n=50, mean=0, sd=1)
> > > > > group2<-rnorm(n=20, mean=1, sd=1.5)
> > > > > group3<-c(group1,group2)
> > > > >
> > > > >
> > > > > Now, if I am given a dataset from group3, what method
> > > (discriminant
> > > > > analysis, clustering, maybe) is the best to cluster them
> > > by using R.
> > > > > The known info includes: 2 clusters, normal distribution (but the
> > > > > parameters are unknown).
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Ed
> > > > >
> > > > > ______________________________________________
> > > > > R-help at stat.math.ethz.ch mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > > >
> > >
> > > 
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> > 
> > 
> > ------------------------------------------------------------------------------
> > Notice:  This e-mail message, together with any attachment...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de




More information about the R-help mailing list