[R] boot() with glm/gnm on a contingency table

Thu Sep 13 06:19:55 CEST 2012

>Le mercredi 12 septembre 2012 à 07:08 -0700, Tim Hesterberg a écrit :
>> One approach is to bootstrap the vector 1:n, where n is the number
>> of individuals, with a function that does:
>> f <- function(vectorOfIndices, theTable) {
>>   (1) create a new table with the same dimensions, but with the counts
>>   in the table based on vectorOfIndices.
>>   (2) Calculate the statistics of interest on the new table.
>> }
>>
>> When f is called with 1:n, the table it creates should be the same
>> as the original table.  When called with a bootstrap sample of
>> values from 1:n, it should create a table corresponding to the
>> bootstrap sample.
>Indeed, that's another solution I considered, but I wanted to be sure
>nothing more reasonable exists. You're right that it's more efficient
>than replicating the whole data set. But still, with a typical table of
>less than 100 cells and several thousands of observations, this means
>creating a potentially long vector, much larger than the original data;
>nothing really hard with common machines, to be sure.
>
>If no other way exists, I'll use this. Thanks.

In your original posting you also suggested:
>>>The other way would be generate importance weights based on observed
>>>frequencies, and to multiply the original data by the weights at each
>>>iteration, but I'm not sure that's correct. Thoughts?

You could do:

bootstrapTable <- x  # where x is the original table
for(i in numberOfBootstrapSamples) {
  bootstrapTable[] <- rmultinom(1, size = sum(x), prob = x)
  replicate[i] <- myFunction(bootstrapTable)
}
# caveat - not tested

I can't tell from help(boot) whether you could do it correctly there.
boot has a 'weights' argument that you could use for the sampling
probabilities, but you also need a way to tell it to draw sum(x)
observations.  Or, you could also pass boot a "parametric" sampler.  
But be careful if you use boot in either of these ways; you not only
need to generate the bootstrap samples, you also need to make sure
that it is does all other calculations correctly, including
calculating the statistic for the original data, calculating jackknife
statistics if they are used for confidence intervals, etc.

Wistful sigh - this would be pretty easy to do with S+Resample.

Tim Hesterberg