[R] boot() with glm/gnm on a contingency table

Milan Bouchet-Valat nalimilan at club.fr
Sun Sep 16 21:02:03 CEST 2012


Le mercredi 12 septembre 2012 à 07:08 -0700, Tim Hesterberg a écrit :
> One approach is to bootstrap the vector 1:n, where n is the number
> of individuals, with a function that does:
> f <- function(vectorOfIndices, theTable) {
>   (1) create a new table with the same dimensions, but with the counts
>   in the table based on vectorOfIndices.
>   (2) Calculate the statistics of interest on the new table.
> }
> 
> When f is called with 1:n, the table it creates should be the same
> as the original table.  When called with a bootstrap sample of
> values from 1:n, it should create a table corresponding to the
> bootstrap sample.
If anybody is interested, I've finally taken this way, the function
described above being implemented as below. The idea is to assign an
index to each observation, and identify which cell the observation comes
from using the cumulative sum. Instead of going over all indices and
adding incrementing the corresponding cell count for each, I decided to
start with the original data, decrementing the counts for missing
indices, and incrementing it for duplicates. There are probably better
implementations, but performance-wise it seems good enough.

# tab is a table object
f <- function(tab, indices) {
  cs <- cumsum(tab)

  # Remove missing observations
  for(i in setdiff(1:sum(tab), indices)) {
      index <- min(which(i <= cs))
      tab[index] <- tab[index] - 1
  }

  # Add duplicate observations
  for(i in indices[duplicated(indices)]) {
      index <- min(which(i <= cs))
      tab[index] <- tab[index] + 1
  }
}


Thanks for the pointers!




More information about the R-help mailing list