SV: [R] sample from contingency table

Wed Sep 20 10:29:02 CEST 2000

Regin Reinert writes:

> I have had the same problem and I wrote this function
> 
> rmulti <- function(n, size, p)
> {
>   NrDim <- length(p)
>   if(NrDim<2) stop("The simulated variabel has to be at least
> 2-dimensional")
>   res <- matrix(data=NA, nrow=n, ncol=NrDim)
>   p <- p/sum(p)
>   TempSize <- size
>   for(i in 1:NrDim)
>   {
>     TempP <- p[i]/sum(p[i:NrDim])
>     TempBin <- rbinom(n=n, size=TempSize, prob=TempP)
>     TempSize <- TempSize-TempBin
>     res[,i] <- TempBin
>   }
>   return(res)
> }
> 
> # Then you can draw 10 samples like this, whith
> # each row representing a contingency table
> 
> x <- as.matrix(1:4, nrow=2, ncol=2)
> rmulti(10, sum(x), x)
> 
> 
> Regin

Hey, hang on...  If I have understood the original question properly
what you have to do is to sample from the cells of a contingency table
with probabilities proportional to the frequencies in those cells.
Here is the original question:

> -----Oprindelig meddelelse-----
> Fra: Dirk F. Raetzel [mailto:raetzel at Mathematik.Uni-Marburg.de]
> Sendt: 19. september 2000 18:48
> Til: R-Help Mailing List
> Emne: [R] sample from contingency table
> 
> 
> Hello,
> 
> I have a multivariate (dim >= 3) discrete distribution
> given by a contingency table from which I want to draw independent
> random samples. The result should be a data.frame (or array) with each
> column representing a dimension.
> 
> Before starting to hack some search tree with approbiate
> transformations: Is there any built-in function I
> have overseen or did anybody program such a function already?
> 
> Dirk

I can't see why you would need a "search tree" for this problem
either.  Here is (what I think is) a very simple solution:

sampct <- function(n, Fr) {
# sample with replacement from a multivariate distribution
# defined by a contingency table
  if(!is.null(dfr <- dimnames(Fr)) &&
     prod(sapply(dfr, length)) == length(Fr))
    dfr <- expand.grid(dfr)
  else
    dfr <- expand.grid(lapply(dim(Fr), seq))
  dfr[sample(1:nrow(dfr), n, prob = Fr, rep = T), ]
}

Here n is the sample size and Fr the array of frequencies in the
table.  The result is an n x k data frame of cell indices where k is
the dimension of Fr.

It uses the fact that sample() can sample with replacement and with
probabilities proportional to the entries in a (non-negative) vector.
As usual there are a few little obscurities there to make it
interesting...

-- 
Bill Venables,      Statistician,     CMIS Environmetrics Project
CSIRO Marine Labs, PO Box 120, Cleveland, Qld,  AUSTRALIA.   4163
Tel: +61 7 3826 7251           Email: Bill.Venables at cmis.csiro.au    
Fax: +61 7 3826 7304      http://www.cmis.csiro.au/bill.venables/

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._