[R] how to draw random numbers from many categorical distributions quickly?

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Thu Dec 15 09:34:10 CET 2011

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Sean Zhang
> Sent: Wednesday, December 14, 2011 10:07 PM
> To: r-help at r-project.org
> Subject: [R] how to draw random numbers from many categorical
> distributions quickly?
> Dear R helpers,
> I have a question about drawing random numbers from many categorical
> distributions.
> Consider n individuals, each follows a categorical distribution defined
> over k categories.
> Consider a simple case in which n=4, k=3 as below
> catDisMat <-
> rbind(c(0.1,0.2,0.7),c(0.2,0.2,0.6),c(0.1,0.2,0.7),c(0.1,0.2,0.7))
> outVec <- rep(NA,nrow(catDisMat))
> for (i in 1:nrow(catDisMat)){
> outVec[i] <- sample(1:3,1, prob=catDisMat[i,], replace = TRUE)
> }
> I can think of one way to potentially speed it up (in reality, my n is
> very
> large, so speed matters). The approach above only samples 1 value each
> time. I could have sampled two values for c(0.1,0.2,0.7) because it
> appears
> three times. so by doing some manipulation, I think I can have the
> idea,
> "sample(1:3, 3, prob=c(0.1,0.2,0.7), replace = TRUE)",  implemented to
> improve speed a bit. But, I wonder whether there is a better approach
> for
> speed?
> Thanks in advance.
> -Sean


How about something like this:

outVec <- apply(catDisMat,1, function(x)sample(1:3, 1, prob = x, replace = TRUE))

I created a catDisMat matrix with a million rows and apply crunched through it in approximately 8-9 seconds on my 2.67 GHz 64-bit Windows 7 box with 12 GB of ram.  Your code above was substantially slower. 

Hope this is helpful,


Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204

More information about the R-help mailing list