[R] Clustering

Sat Oct 30 14:31:42 CEST 2010

On Oct 30, 2010, at 7:49 AM, dpender wrote:

> David Winsemius wrote:
>>
>> On Oct 29, 2010, at 12:08 PM, David Winsemius wrote:
>>
>>>
>>> On Oct 29, 2010, at 11:37 AM, dpender wrote:
>>>
>>>> Apologies for being vague,
>>>>
>>>> The structure of the output is as follows:
>>>
>>> Still no code?
>>
>>
> I am using the Clusters function from the evd package
>>
>>>>
>>>> $ cluster1  : Named num [1:131] 3.05 2.71 3.26 2.91 2.88 3.11  
>>>> 3.21  -1 2.97 3.39 ...
>>>> ..- attr(*, "names")= chr [1:131] "6667" "6668" "6669" "6670" ...
>>>>
>>>> With 613 clusters.  What I require is abstracting the first and  
>>>> last value of
>>>>
>>>> - attr(*, "names")= chr [1:131] "6667" "6668" "6669" "6670"
>>>
>>> Those values are in an attribute:
>>
>> Corrections:
>>>
>>> ? attribute
>> ?attributes
>>> ? attr
>>> Your specific request may (perhaps) be addressed by something like:
>>> attrnames <- attr(objname["cluster1"], "names")
>>                            ^          ^   should be doubled square-
>>
>>
> This works to abstract the part that I am looking for but in order  
> to loop
> this over every cluster I need an output object of the same form as
> clusters to write the names to.

THat is rather difficult to implement since the phrase "same form as  
the clusters" is still undetermined in the absence of full output from  
str() or an actual data object. The help page for the clusters  
function (not Clusters, BTW) could be used for a concrete example:

require(evd)
data(portpirie)
clusobj <- clusters(portpirie, 4.2, 3)
lapply(clusobj, attr, "names")
nclusters <- length(clusobj)
# This gives the locations (in the names) and values at the beginning  
and end of the 6 clusters

 > lapply(clusobj, function(x) c(head(x,1), tail(x,1)))
$cluster1
    9   12
4.36 4.69

$cluster2
   20   26
4.25 4.37

$cluster3
   31   31
4.55 4.55

$cluster4
   38   43
4.21 4.21

$cluster5
   58   59
4.33 4.55

$cluster6
   65   65
4.33 4.33

# If you used sapply you could get the values as a matrix:

 > sapply(clusobj, function(x) c(head(x,1), tail(x,1)))
    cluster1 cluster2 cluster3 cluster4 cluster5 cluster6
9      4.36     4.25     4.55     4.21     4.33     4.33
12     4.69     4.37     4.55     4.21     4.55     4.33

# (I don't know what the 9 and 12 represent.)

# You can also get the sequence boundary locations in a (character)  
matrix:
 > sapply(clusobj, function(x) names(c(head(x,1), tail(x,1))))
      cluster1 cluster2 cluster3 cluster4 cluster5 cluster6
[1,] "9"      "20"     "31"     "38"     "58"     "65"
[2,] "12"     "26"     "31"     "43"     "59"     "65"

>>
>> brackets
>>> attrnames[c(1, length(attrnames)]
>>                                  ^  missing right-paren
>>
>> Might work:
>> attrnames <- attr(clusobj[["cluster1"]], "names")
>> attrnames[c(1, length(attrnames))]
>> --
>>
>> David Winsemius, MD
>> West Hartford, CT
>

> Additionally I can get the output as a matrix in form
>
> atomic [1:613] 3.01 4.1 3.04 3.81 3.55 3.37 3.09 4.1 3.61 6.36 ...
> - attr(*, "acs")= num 47.6
>
> where "acs" is the average size.  Each height value in the vector  
> has a
> corresponding number relating to the location in the dataset.

Better would be to tell us _how_ "each height value" has a  
"corresponding number relating to the location". It is not apparent  
from the above. Some other object you are not naming or describing for  
us?

>  When I change
> the vector to matrix this looks like the row name but it isn't as
> rownames(clusters) yields NULL.
>
> Do you have any idea how to abstract these values?

First you need to figure out what these are. Better than guessing and  
applying extractor functions, would be to use str() and class() on the  
result ... and for Pete's sake , include the full console output  
rather than your guess at what is needed.

>
> Doug
> --

David Winsemius, MD
West Hartford, CT