[R] non-hierarchical non-exclusive clustering of large data sets

Mon May 24 15:58:57 CEST 2004

Hi,

I'm trying to use R to cluster words with related meanings. Does anyone
know of a non-hierarchical clustering method in R that produces
non-exclusive clusters? With non-exclusive, I mean that words should be
allowed to be part of multiple clusters. So my data matrix would look
something like:

		T1	T2	T3
CLOWN_N	0	1	0
BANK_N	3	0	2
RIVER_N	0	0	2
FLOW_V	0	0	3
MONEY_N	2	0	0
PAY_V		2	0	0

The first line indicates the noun "clown" occurred only once in my text
collection, namely in text 2. Ideally, the clustering method would
produce the clusters [bank_n,river_n,flow_v], [bank_n,money_n,pay_v] and
[clown_n].
The data matrix I would use would be much bigger than the one above, its
dimensions would be in the order of (100000,100000). Does anyone know if
this would cause practical problems, perhaps very slow clustering?

Best wishes,

Murk Wuite, MA student
Department of Language and Speech
Katholieke Universiteit Nijmegen, The Netherlands