[R] slightly OT: (un)supervised clustering?

viktoras didziulis viktoras at ekoinf.net
Tue Oct 28 20:32:58 CET 2008


Hi,

my question is not exactly about R... What I am looking for are hints 
and directions on suitable methods (available in R or elsewhere)  to 
solve a grouping (or pattern recognition) problem of environmental 
features in an environmental gradient as described below.

Given environmental sampling data set  (Depth, Presence of sand, 
Presence of boulders, Presence of clay).
1 1 1 0
1 1 0 0
1 1 1 0
2 1 1 0
3 1 1 0
3 1 1 0
4 1 1 0
5 1 0 0
5 1 0 0
5 1 1 0
5 1 0 0
6 1 0 0
6 1 0 0
6 1 1 0
7 1 0 1
7 1 0 0
8 1 0 1
9 1 1 1
9 1 0 1
9 1 0 1

Once I have sampling data ordered by depth, using my own "expert" 
opinion I can distinguish 3 groups A, B, C: A (1 - 4 m depth range) - 
where both sand and boulders are present, B (5 - 6 m range) - where sand 
is dominant with just a few observations of boulders, C (7 - 9 m range) 
- substrate dominated by sand and clay.

Now the question - is there any formal method that can do the same e.g. 
separate the groups A, B and C by analyzing how does feature occurrence 
patterns change in samples along an environmental gradient (depth in 
this case)? Sample dataset here is simplified, in fact I have to deal 
with a dozen of features like salinity, exposure and related species 
lists. I "see" these groups as an expert, but it would be nice having a 
helper algorithm to see the groups for me, so I could describe it in 
Methods section of my writings :-)

Similarity matrix and Cluster analysis or MDS do not perform as 
expected, because it groups stations from group A together with stations 
of other groups that have most similar substrate observations e.g. it 
ignores environmental gradient.
Discriminant analysis expects me to do the grouping and then it will 
"decide" the rest. Therefore not suitable.
A bunch of significance tests can help in deciding whether the 
differences are statistically significant. But again, I have to present 
my own groups, therefore - not suitable.
Other unsupervised learning algorithms (Neural Networks & Co) - well, 
how can I instruct them to do analysis along an environmental gradient 
of depth ?..

If anyone among the experts on this list has dealt with similar problems 
before I would highly appreciate if you could briefly describe your 
approaches or point to the right sources.

And in general I am interested in approaches of locating discontinuities 
in data patterns sampled along environmental gradients.

Best wishes!
Viktoras Didziulis
P.S. just subscribed to this list, sorry if I'm missing something



More information about the R-help mailing list