[R] replacing elements of distance matrix

Michael Ralph M. Abrigo mmabrigo at gmail.com
Tue Jul 20 02:12:44 CEST 2010


Thank you very much for your help, Nikhil! The code I'm using now is

#generate data
set.seed(2)
x <- as.matrix(runif(5))
id1 <- sample(1:2,5,replace=T)
id2 <- c(1:5)
rownames(x) <- paste(id1, id2)

#create distance matrix if same id1
x.L <- split(x,id1)
n.L <- split(rownames(x), id1)
for(i in 1:length(x.L)){
  names(x.L[[i]]) <- n.L[[i]]
 }
m2 <- function(i,j) {
  mahalanobis(j,i,var(j))
 }
m3 <- function(k) {
  apply(as.matrix(k),1,m2,as.matrix(k))
 }
dd <- lapply(x.L, m3)
df <- bdiag(dd)

rownames(df) <- sort(rownames(x))
colnames(df) <- sort(rownames(x))

x.L
df
dd


Cheers,
Michael

On Tue, Jul 20, 2010 at 2:27 AM, Nikhil Kaza <nikhil.list at gmail.com> wrote:
> My mistake, instead of colnames(d1)
>
> use substr(colnames(d1),1,1) or similar
>
> On Jul 19, 2010, at 2:15 PM, Nikhil Kaza wrote:
>
>> Michael,
>>
>> You can modify the following code to suit. Also avoid using dist as a
>> variable name since it is a function in base. However, are you sure you want
>> to do this? Sx is the variance using sites in all the regions!
>>
>> d1 <- apply(x,1, function(i){mahalanobis(x,i,Sx)})
>> is.na(d1) <- !sapply(id1, grepl, colnames(d1), fixed=T)
>>
>> If on the other hand you want to use only variance within a region modify
>> like this ( i am sure more optimal code can be written)
>>
>> #not tested
>> x.L <- split(x,id1)
>> n.L <- split(rownames(x), id1)
>> for (i in 1:length(x.L)){names(x.L[[i]]) <- n.L[[i]]}
>> m2 <- function(i,j){mahalanobis(j, i, var(j))}
>> m3 <- function(k){apply(as.matrix(k),1,m2,as.matrix(k))}
>> d2 <- lapply(x.L, m3)
>>
>>
>>
>> Nikhil Kaza
>> Asst. Professor,
>> City and Regional Planning
>> University of North Carolina
>>
>> nikhil.list at gmail.com
>>
>> On Jul 19, 2010, at 11:37 AM, Michael Ralph M. Abrigo wrote:
>>
>>> Thanks for the tip, Nikhil. However, i need only one matrix as input
>>> for another to compute for non-bipartite matching which minimizes
>>> pairwise distances between observations. As such, I need the
>>> georeference (id) of the observations for subsequent processing. Below
>>> is an illustration.
>>>
>>>
>>>> #generate data
>>>> x <- as.matrix(runif(5))
>>>> Sx <- var(x)
>>>>
>>>> #generate id
>>>> set.seed(1)
>>>> id1 <- sample(1:2,5, replace=T)
>>>> id2 <- c(1:5)
>>>> rownames(x) <- paste(id1, id2)
>>>>
>>>> #generate distance
>>>> dist <- as.matrix(
>>>
>>> +   apply(x,1,function(i){
>>> +     mahalanobis(x,i,Sx)
>>> +    }
>>> +  )
>>> + )
>>>>
>>>> #print matrices
>>>> x
>>>
>>>        [,1]
>>> 1 1 0.2059746
>>> 1 2 0.1765568
>>> 2 3 0.6870228
>>> 2 4 0.3841037
>>> 1 5 0.7698414
>>>>
>>>> dist
>>>
>>>          1 1        1 2        2 3       2 4        1 5
>>> 1 1 0.00000000 0.01165534 3.11660015 0.4273402 4.28210082
>>> 1 2 0.01165534 0.00000000 3.50943798 0.5801450 4.74056406
>>> 2 3 3.11660015 3.50943798 0.00000000 1.2358255 0.09237602
>>> 2 4 0.42734018 0.58014499 1.23582554 0.0000000 2.00395492
>>> 1 5 4.28210082 4.74056406 0.09237602 2.0039549 0.00000000
>>>
>>>
>>> The geo-id is composed of two references, the first digit for the
>>> region and the next for the observation itself. What I'm thinking of
>>> is for pairwise distance between observations of different regions,
>>> say site-11 and site-23 or site-24 to be replaced by a large number,
>>> say 999999. I need the id for future processing, though.
>>> Maybe I can stack the matrices generated using your tip to form a
>>> block diagonal matrix, but then I do not have my ids? Im really sorry.
>>> Im a bit lost.
>>> Cheers,
>>> Michael
>>>
>>> On Mon, Jul 19, 2010 at 10:10 PM, Nikhil Kaza <nikhil.list at gmail.com>
>>> wrote:
>>>>
>>>> replace dist with mahalanobis distance in the following example.
>>>>
>>>> a <- cbind(runif(10), sample(1:3, 10, replace=T))
>>>> a.L <- split(a,a[,2])
>>>> dist.L <- lapply(a.L, dist)
>>>>
>>>>
>>>>
>>>> Nikhil Kaza
>>>> Asst. Professor,
>>>> City and Regional Planning
>>>> University of North Carolina
>>>>
>>>> nikhil.list at gmail.com
>>>>
>>>> On Jul 19, 2010, at 9:24 AM, Michael Ralph M. Abrigo wrote:
>>>>
>>>>> Hi! I am trying to implement non-bipartite matching. I have around 500
>>>>> sites
>>>>> which can be clustered by 10 regions. I am able to calculate pairwise
>>>>> Mahalanobis distances between sites (thanks to another post in the
>>>>> forum).
>>>>> However, I want to constrain my match to sites within the same region.
>>>>> Thus
>>>>> I want to replace elements of the distance matrix with a high value,
>>>>> say
>>>>> 999999, for sites not of the same region so that the pair will not be
>>>>> matched.
>>>>> In the original data file I have information on which sites belong to
>>>>> what
>>>>> region. However, when I compute for pairwise Mahalanobis distances, I
>>>>> only
>>>>> use a subset of the file, which, naturally, does not include the
>>>>> georeference of the sites. How should I do this? Any hint will be most
>>>>> appreciated.
>>>>> Btw, I am relatively new in using R. I may export the matrix to another
>>>>> program and replace the elements there, but that is a very very dirty
>>>>> and
>>>>> rough trick that I would rather not do given better options.
>>>>> Many thanks in advance.
>>>>>
>>>>> Cheers,
>>>>> Michael
>>>>>
>>>>> --
>>>>> "I am most anxious for liberties for our country... but I place as a
>>>>> prior
>>>>> condition the education of the people so that our country may have an
>>>>> individuality of its own and make itself worthy of liberties... " Jose
>>>>> Rizal,1896
>>>>>
>>>>>      [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> "I am most anxious for liberties for our country... but I place as a
>>> prior condition the education of the people so that our country may
>>> have an individuality of its own and make itself worthy of
>>> liberties... " Jose Rizal,1896
>>>
>>>
>>>
>>> --
>>> "I am most anxious for liberties for our country... but I place as a
>>> prior condition the education of the people so that our country may
>>> have an individuality of its own and make itself worthy of
>>> liberties... " Jose Rizal,1896
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>



-- 
"I am most anxious for liberties for our country... but I place as a
prior condition the education of the people so that our country may
have an individuality of its own and make itself worthy of
liberties... " Jose Rizal,1896



More information about the R-help mailing list