[R] Sampling the Distance Matrix

Fri Sep 25 21:15:36 CEST 2015

Absolutely right!
Thanks to both David for their help.
Cheers

Lorenzo

On Fri, Sep 25, 2015 at 01:54:54PM +0000, David L Carlson wrote:
>You defined x and y in your original email as:
>
>> x<-rnorm(20)
>> y<-rnorm(20)
>>
>> mm<-as.matrix(cbind(x,y))
>>
>> dst<-(dist(mm))
>
>-------------------------------------
>David L Carlson
>Department of Anthropology
>Texas A&M University
>College Station, TX 77840-4352
>
>
>-----Original Message-----
>From: David Winsemius [mailto:dwinsemius at comcast.net]
>Sent: Thursday, September 24, 2015 6:30 PM
>To: Lorenzo Isella
>Cc: David L Carlson; r-help at r-project.org
>Subject: Re: [R] Sampling the Distance Matrix
>
>
>On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:
>
>> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote:
>>>
>>> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:
>>>
>>>> Hi,
>>>> And thanks for your reply.
>>>> Essentially, your script gets the job done.
>>>> For instance, if I run
>>>>
>>>> mm <- cbind(5/(1:5), -2*sqrt(1:5))
>>>> dst <- dist(mm)
>>>> dst2 <- as.matrix(dst)
>>>> diag(dst2) <- NA
>>>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
>>>>
>>>> then it correctly detects the first two rows, where all the values are
>>>> larger than 0.9.
>>>> In other words, it detects the points that are at least 0.9 units away
>>>> from *all* the other points.
>>>> My other question (I did not realize this until I got your answer) is
>>>> the following: I have the distance matrix of a set of N points.
>>>> You gave me an algorithm two find all the points that are at least 0.9
>>>> units away from any other points.
>>>> However, in some cases, for me it is OK even a weaker condition: find
>>>> a subset of k points (with k tunable) whose distance *from each other*
>>>> is greater than 0.9 units (even if their distance from some other
>>>> points may be smaller than 0.9).
>>>
>>> If I understand ..... Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion:
>>>
>>> mtxcomb <- combn(1:20, 5)
>>> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9))
>>> mtxcomb [ , goodcls]
>>>
>>> In my sample it was around 9% of the total 5 item combinations.
>>>
>>> snipped a lot of output:
>>> .....
>>>   [,1440] [,1441]
>>> [1,]      12      13
>>> [2,]      13      16
>>> [3,]      16      17
>>> [4,]      19      19
>>> [5,]      20      20
>>>> dim( mtxcomb)
>>> [1]     5 15504
>>>
>>
>> Hi,
>> Thanks for your reply.
>> I think I am getting there, but when I run your commands, I get this
>> error message
>>
>> Error in cbind(x[idx], y[idx]) : object 'x' not found
>>
>> Any idea why? Should I combine those 3 lines with something else?
>
>No idea. I was running the setup that you asked for in your original message which you have now omitted from the mail chain.
>
>
>
>> Cheers
>>
>> Lorenzo
>
>David Winsemius
>Alameda, CA, USA
>