[R] Sampling the Distance Matrix

Fri Sep 25 22:56:56 CEST 2015

On Sep 25, 2015, at 12:54 PM, Lorenzo Isella wrote:

> Apologies for not letting this thread rest in peace.
> The small script
> 
> #########################################################
> set.seed(1234)
> 
> x <- rnorm(20)
> y <- rnorm(20)
> 
> 
> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx],
> y[idx]) ) > 0.9))
> 
> mycomb <- mtxcomb [ , goodcls]
> #########################################################
> 
> 
> is perfect to detects groups of 5 points whose distances to each other
> are always above 0.9.
> However, in my practical case I have about 500 points and I am looking
> for subset of several tens of points whose distance is above a given
> threshold.
> Unfortunately, the approach above does not scale, so I wonder if
> anybody is aware of an alternative approach.

Find the center of the distribution, eliminate all the points within some reasonable radius perhaps sqrt( sd(x)^2 +sd(y)^2 ) and then work on the reduced set. If you needed to reduce it even further I could imagine sampling in sectors defined by tan(x/y).

-- 

David Winsemius
Alameda, CA, USA