[R] Faster way for weighted matching?
Frank E Harrell Jr
fharrell at virginia.edu
Wed Jan 15 22:19:03 CET 2003
For each element in w I want to find a good match (subscript number) of an element in x. x and w can be long. Instead of just finding the closest match I want to use weighted multinomial sampling (which I've already figured out once I have the probabilities) where the probabilities come from the tricube function of absolute differences between donor and target values, but normalized to sum to one, and using the maximum absolute difference as the scaling factor. This is similar to the loess weighting function with f=1. Here's code that works, to get the probability matrix to use for sampling:
z <- abs(outer(w, x, "-"))
s <- apply(z, 1, max)
z <- (1 - sweep(z, 1, s, FUN='/')^3)^3
sums <- apply(z, 1, sum)
z <- sweep(z, 1, sums, FUN='/')
Example:
w <- c(1,2,3,7)
x <- c(0,1.5,3)
z <- abs(outer(w,x,"-"))
> z
[,1] [,2] [,3]
[1,] 1 0.5 2
[2,] 2 0.5 1
[3,] 3 1.5 0
[4,] 7 5.5 4
s <- apply(z, 1, max)
z <- (1 - sweep(z, 1, s, FUN='/')^3)^3
z
[1,] 0.6699219 0.9538536 0.0000000
[2,] 0.0000000 0.9538536 0.6699219
[3,] 0.0000000 0.6699219 1.0000000
[4,] 0.0000000 0.1365445 0.5381833
sums <- apply(z, 1, sum)
z <- sweep(z, 1, sums, FUN='/')
z # each row represents multinomial probabilities summing to 1
[1,] 0.4125705 0.5874295 0.0000000
[2,] 0.0000000 0.5874295 0.4125705
[3,] 0.0000000 0.4011696 0.5988304
[4,] 0.0000000 0.2023697 0.7976303
The code is moderately fast. Does anyone know of a significantly faster method or have any comments on the choice of weighting function for such sampling? This will be used in the context of predictive mean matching for multiple imputation. Thanks - Frank
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
More information about the R-help
mailing list