[R] Clustered data with Design package--bootcov() vs. robcov()

Mon Apr 13 14:51:30 CEST 2009

jjh21 wrote:
> Hi,
> 
> I am trying to figure out exactly what the bootcov() function in the Design
> package is doing within the context of clustered data. From reading the
> documentation/source code it appears that using bootcov() with the cluster
> argument constructs standard errors by resampling whole clusters of
> observations with replacement rather than resampling individual
> observations. Is that right, and is there any more detailed documentation on
> the math behind this? Also, what is the difference between these two
> functions:

Correct.  Did you read the Feng et al reference in bootcov's help file 
or check the book that is related to the package?

> 
> bootcov(my.model, cluster.id)
> robcov(my.model, cluster.id)

robcov does not use bootstrapping.  It uses the cluster sandwich 
(Huber-White) variance-covariance estimator for which there are 
references in the help file (see especially Lin).

Both robcov and bootcov work best when there is a large number of small 
clusters.  If the clusters are somewhat large and greatly vary in size, 
expect to be in trouble and consider a full modeling approach 
(generalized least squares, mixed models, etc.).

One advantage of robcov is that you get the same result every time, 
unlike bootstrapping.  But even in the case of cluster sizes of one, the 
sandwich estimator can be inefficient (see the Gould paper) or can 
result in the "right" estimates of the "wrong" quantity (see a paper by 
Friedman in American Statistician).

Frank

> 
> Thank you.

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University