[R] Fanny Clustering

Philippe Grosjean phgrosjean at sciviews.org
Thu Mar 29 13:06:08 CEST 2007



Sergio Della Franca wrote:
> Ok,
> 
> How can i increase the memory of your computer available to R?

Well, if you would like to increase memory of MY computer... you are 
welcome to do so... but I doubt it would be of any use for you ;-)

You don't tell us how much RAM you have currently, which platform you 
use, etc... The general approach is to use a computer with more RAM, up 
to the limit permitted by a 32-bit system for R, and then, to switch to 
a 64-bit version under Linux, if you need even more RAM.

The other proposed solution is not stupid. With 70.000 cases, you have a 
fairly large dataset. You don't tell use how many groups you expect from 
your clustering, but it is often better to use a couple of tens, or 
hundreds of representative cases for each group, no more. In supervised 
classification, it is easier to build such a training set with 
relatively balanced number of items in each group, because targeted 
classification is known a priori from the manual classification provided.

With unsupervised classification, you could either try a pure random 
subsampling, or select your subsample based on similarity according to a 
given distance measurement. I did something like that using a 
Malahanobis distance, MDS, and then, stratified subsampling inside a 
regular grid placed on top of the MDS plot.

Otherwise, I am not a specialist of unsupervised classification, and 
other people here could have better suggestion.

Best,

Philippe Grosjean

> 
> 2007/3/29, Philippe Grosjean <phgrosjean at sciviews.org>:
>> 1) Reduce the size of your sample (random or stratified subsampling),
>>
>> 2) Increase the memory of your computer available to R.
>>
>> Best,
>>
>> Philippe Grosjean
>>
>> ..............................................<°}))><........
>> ) ) ) ) )
>> ( ( ( ( (    Prof. Philippe Grosjean
>> ) ) ) ) )
>> ( ( ( ( (    Numerical Ecology of Aquatic Systems
>> ) ) ) ) )   Mons-Hainaut University, Belgium
>> ( ( ( ( (
>> ..............................................................
>>
>> Sergio Della Franca wrote:
>>> Dear R-Helpers,
>>>
>>>
>>> I'd like to develop a fanny clustering on my data set(70.000 rows), but
>> when
>>> i run the procedure i obtain this error:
>>>
>>> error in vector("double", lenght): too big dimension for
>>> the selected vector.
>>>
>>>
>>> How can i solve this problem?
>>>
>>>
>>> Thank you in advance.
>>>
>>>
>>> Sergio Della Franca.
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list