[R] A question about sampling

Greg Snow Greg.Snow at imail.org
Wed Feb 2 23:38:34 CET 2011


The apply functions are really just hidden loops, and loops have been made efficient enough that they are usually not much slower (and sometimes a bit faster) than the apply's.

If you really want to use apply, then look at mapply (might need to convert the matrix to a list), or you could use sapply on the vector 1:500 and write a function that indexes into the matrix and vector.  But if you understand the loop, then I would suggest using the loop.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Patrick Boily
> Sent: Wednesday, February 02, 2011 1:03 PM
> To: 'r-help at r-project.org'
> Subject: [R] A question about sampling
> 
> Greetings,
> 
> I am attempting to do something with R that I think should be
> efficiently do-able, but I haven't yet found success.
> 
> I have a vector of probability weights (for 17 categories), let's call
> it things (it could look like the one below, for instance).
> 
> > things
> 0.026 0 0.233 0 0.131 0 0.415 0 0 0 0 0 0.192 0 0.067 0 0
> 
> I'd like a sample of size size.things (say, 47) of the 17 categories
> (with replacement). And I'd like to produce a vector of length 17 which
> enumerates the number of times each category has been selected. This is
> fairly straightforward to do; for instance:
> 
> > things2<-
> table(factor(sample(1:17,size.things[1],replace=TRUE,prob=things),level
> s=1:17))
>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
>  1  0  9  0  4  0 18  0  0  0  0  0  5  0  4  0  0
> 
> What would I need to do if I had a matrix things (50000 x 17) of
> probability weight vectors and a vector of sample sizes size.things (of
> length 50000), and I wanted to simultaneously sample size.things[1] of
> the 17 categories with probability weight vector things[1,],
> size.things[2] of the 17 categories with probability weight vector
> things[2,], etc. A loop will do the trick, but it takes a while and it
> seems to me that I could more efficiently use tapply somehow. Or
> something that behaves like rowSums. I'm not familiar enough with R to
> see an easy way out. Perhaps there isn't? Does anybody have an idea?
> 
> Regards,
> 
> Patrick
> 
> 
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list