[R] vectorized approach to cumulative sampling

Rich FitzJohn rich.fitzjohn at gmail.com
Thu Apr 7 23:47:44 CEST 2005


Hi,

sample() takes a "replace" argument, so you can take large samples,
with replacement, like this: (In the sample() call, the
50*target/mean(old) should make it sample 50 times more than likely.
This means the while loop will probably get executed only once.  This
could be tuned easily, and there may be better ways of guessing how
much to take).

old <- c(1:2000)
p <- runif(1:2000)
target <- 4000
new <- 0

while ( sum(new) < target )
  new <- sample(old, 50*target/mean(old), TRUE, p)

i <- which(cumsum(new) >= target)[1]
new <- new[1:i]
new[i] <- new[i] - (sum(new)-target)

Cheers,
Rich

On Apr 8, 2005 9:19 AM, Daniel E. Bunker <deb37 at columbia.edu> wrote:
> Hi All,
> 
> I need to sample a vector ("old"), with replacement, up to the point
> where my vector of samples ("new") sums to a predefined value
> ("target"), shortening the last sample if necessary so that the total
> sum ("newsum") of the samples matches the predefined value.
> 
> While I can easily do this with a "while" loop (see below for example
> code), because the length of both "old" and "new" may be > 20,000, a
> vectorized approach will save me lots of CPU time.
> 
> Any suggestions would be greatly appreciated.
> 
> Thanks, Dan
> 
> # loop approach
> old=c(1:10)
> p=runif(1:10)
> target=20
> 
> newsum=0
> new=NULL
> while (newsum<target) {
>    i=sample(old, size=1, prob=p);
>    new[length(new)+1]=i;
>    newsum=sum(new)
>    }
> new
> newsum
> target
> if(newsum>target){new[length(new)]=target-sum(new[-length(new)])}
> new
> newsum=sum(new); newsum
> target
> 

-- 
Rich FitzJohn
rich.fitzjohn <at> gmail.com   |    http://homepages.paradise.net.nz/richa183
                      You are in a maze of twisty little functions, all alike




More information about the R-help mailing list