[R] EM unsupervised clustering
Bettina Gruen
gruen at ci.tuwien.ac.at
Thu Jul 19 08:14:37 CEST 2007
Federico,
you might also want to have a look at packages "flexclust" or "flexmix",
so you can take into account that you have binary data. The "mclust"
package can be used to estimate mixtures of Gaussian distributions.
"flexclust" implements kmeans-like algorithms, but you can specify a
distance measure appropriate for binary data. "flexmix" allows latent
class analysis with binary data using FLXMCmvbinary() for the component
specific model.
Best,
Bettina
Federico Calboli wrote:
> Hi All,
>
> I have a n x m matrix. The n rows are individuals, the m columns are variables.
>
> The matrix is in itself a collection of 1s (if a variable is observed for an
> individual), and 0s (is there is no observation).
>
> Something like:
>
> [,1] [,2] [,3] [,4] [,5] [,6]
> [1,] 1 0 1 1 0 0
> [2,] 1 0 1 1 0 0
> [3,] 1 0 1 1 0 0
> [4,] 0 1 0 0 0 0
> [5,] 1 0 1 1 0 0
> [6,] 0 1 0 0 1 0
>
>
> I use kmeans to find 2 or 3 clusters in this matrix
>
> k2 = kmeans(data, 2, 10000000)
> k3 = kmeans(data, 3, 10000000)
>
> but I would like to use something a bit more refined, so I though about a EM
> based clustering. I am using the Mclust() function from the mclust package, but
> I get the following (to me incomprehensible) error message:
>
> plot(Mclust(as.data.frame(data)), as.data.frame(data))
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Error in 1:L : NA/NaN argument
> In addition: Warning messages:
> 1: best model occurs at the min or max # of components considered in:
> summary.mclustBIC(Bic, data, G = G, modelNames = modelNames)
> 2: optimal number of clusters occurs at min choice in:
> Mclust(as.data.frame(anc.st.mat))
> 3: insufficient input for specified plot in: coordProj(data = data, parameters =
> x$parameters, z = x$z, what = "classification",
>
> That's puzzling because the example given by ?Mclust is something like
>
> plot(Mclust(iris[,-5]), iris[,-5])
>
> which is pretty simple and dumbproof and works flawlessly...
>
> best,
>
> Federico
>
More information about the R-help
mailing list