[R] Identifying clusters of size n

Mose mose.andre at gmail.com
Mon Jun 15 07:34:30 CEST 2009


Hey Nathan,

You might like the DBSCAN algorithm.

http://en.wikipedia.org/wiki/DBSCAN

There's an implementation in the 'fpc' package.

http://cran.r-project.org/web/packages/fpc/index.html

-Mose

On Sun, Jun 14, 2009 at 7:36 PM, Dylan
Beaudette<dylan.beaudette at gmail.com> wrote:
> On Sun, Jun 14, 2009 at 7:26 PM, Nathan S.
> Watson-Haigh<nathan.watson-haigh at csiro.au> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Dylan Beaudette wrote:
>>> On Sun, Jun 14, 2009 at 4:39 PM, Nathan S.
>>> Watson-Haigh<nathan.watson-haigh at csiro.au> wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Is there a library which is capable of identifying distinct clusters of size n
>>>> from a series of XY coordinates?
>>>>
>>>> Failing this, I'd like to be able to to something like:
>>>> Using a sliding window of size n along the x-axis I'd like to determine the
>>>> distance between the center of the points in the window and the closest point
>>>> outside the window. I could then use a distance cutoff to help define my
>>>> clusters of size n. However, how can I calculate this distance?
>>>>
>>>> Cheers,
>>>> Nathan
>>>>
>>>
>>> Here is a start, using PAM clustering:
>>>
>>> http://casoilresource.lawr.ucdavis.edu/drupal/node/340
>>>
>>> cheers,
>>> Dylan
>>
>
> Hi,
>
>>
>> Thanks, that looks interesting. However I need a clustering algorithm which has
>> the following properties:
>>
>> 1) The ability to define clusters of size n
>> 2) No need to specify a priory how many clusters there will be
>> 3) The ability to omit data from any cluster. I don't think this package can do
>> this.
>
> Time to do some reading on the various clustering algorithms, their
> assumptions, and their overall behaviour. Although I am not an expert,
> many of the constraints you are trying to impose on the clustering
> will require some kind of programming / decision on your end. It may
> help to re-formulate the problem into some kind of raster-operation,
> in which case GRASS GIS might be of interest to you.
>
>> I suspect for something like this I'll have to define, a priory, how tight
>> points within a cluster should be using some measure.
>>
>
> Hmm... In this case you may need to use a model-based / or
> density-based approach. See mclust and spatstat packages. (???)
>
> Cheers,
>
> Dylan
>
>> Any thoughts?
>> Nathan
>>
>> - --
>> - --------------------------------------------------------
>> Dr. Nathan S. Watson-Haigh
>> OCE Post Doctoral Fellow
>> CSIRO Livestock Industries
>> Queensland Bioscience Precinct
>> St Lucia, QLD 4067
>> Australia
>>
>> Tel: +61 (0)7 3214 2922
>> Fax: +61 (0)7 3214 2900
>> Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html
>> - --------------------------------------------------------
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (MingW32)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iEYEARECAAYFAko1sWMACgkQ9gTv6QYzVL7grwCZAQh72v33vPNJJgEFJEhfyNc3
>> 718AnA3k7wvvLEZ4NS1enW3Xp5WhO+qJ
>> =1gyG
>> -----END PGP SIGNATURE-----
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list