[R] choosing best 'match' for given factor

Nick Sabbe nick.sabbe at ugent.be
Thu Mar 31 17:30:51 CEST 2011


Hi Murali.
I haven't compared, but this is what I would do:

bestMatch<-function(searchVector, matchMat)
{
	searchRow<-unique(sort(match(searchVector, colnames(matchMat)))) #if
you're sure, you could drop unique
	cat("Original row indices:")
	print(searchRow)
	matchMat<-matchMat[, -searchRow, drop=FALSE] #avoid duplicates
altogether
	cat("Corrected Matrix:\n")
	print(matchMat)
	correctedRows<-searchRow - seq_along(searchRow) + 1 #works because
of the sort above
	cat("Corrected row indices:")
	print(correctedRows)
	sapply(correctedRows, function(cr){
			lookWhere<-matchMat[cr, seq(cr-1)]
			cat("Will now look into:\n")
			print(lookWhere)
			cc<-which.max(lookWhere)
			cat("Max at position", cc, "\n")
			colnames(matchMat)[cc]
		})
}
I don't think there's that much difference. Depending on specific sizes, it
may be more or less costly to first shrink the search matrix like I do. And
similarly depending, I may be better still if you remove the rows that
you're not interested in as well (some more but similar index trickery
required then.

HTH,


Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove





-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Murali.Menon at avivainvestors.com
Sent: donderdag 31 maart 2011 16:46
To: r-help at r-project.org
Subject: [R] choosing best 'match' for given factor

Folks,

I have a 'matching' matrix between variables A, X, L, O:

> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58, 
0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list(
    c("A", "X", "L", "O"), c("A", "X", "L", "O")))

> a
      A     X     L     O
A  1.00  0.41  0.58  0.75
X  0.41  1.00  0.60  0.86
L  0.58  0.75  1.00  0.83
O  0.60  0.86  0.83  1.00

And I have a search vector of variables

> v <- c("X", "O")

I want to write a function bestMatch(searchvector, matchMat) such that for
each variable in searchvector, I get the variable that it has the highest
match to - but searching only among variables to the left of it in the
'matching' matrix, and not matching with any variable in searchvector
itself.

So in the above example, although "X" has the highest match (0.86) with "O",
I can't choose "O" as it's to the right of X (and also because "O" is in the
searchvector v already); I'll have to choose "A".

For "O", I will choose "L", the variable it's best matched with - as it
can't match "X" already in the search vector.

My function bestMatch(v, a) will then return c("A", "L")

My matrix a is quite large, and I have a long list of search vectors v, so I
need an efficient method.

I wrote this:

bestMatch <- function(searchvector,  matchMat) {
        sapply(searchvector, function(cc) {
                             y <- matchMat[!(rownames(matchMat) %in%
searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))),
cc, drop = FALSE];
                             rownames(y)[which.max(y)]
        })   
}

Any advice?

Thanks,

Murali

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list