[R] Complex sample variances
RBaskin at ahrq.gov
Wed Apr 14 16:29:02 CEST 2004
< Construct a new weight within the stratum as the sample weight
multiplied by the frequency>
The correct formula for the new weights can be found in Chapter 6 of Shao
and Tu (1996) "The Jackknife and the Bootstrap", Springer
" Keith Rust & Jon Rao have an overview article in Statistical Methods in
Medical Research (1996 vol 5, pp 283-310) which review most of the
literature and methods to that point (also see Shao & Tu's book Chapter 6).
They also give the correct formula for the bootstrap weights. It is highly
recommended in Rust & Rao (referring to Rao & Wu) that for bootstrap you
select n(h)-1 out of n(h) PSUs in stratum h with replacement."
If you select n(h)-1 out of n(h) PSUS in strata h the new weight should be:
New-weight = Old-weight * frequency PSU is selected * n(h) / (n(h) - 1)
So if you randomly select 1 out 2 PSUs you double the weight because of the
factor n(h) / (n(h) - 1).
This method is basically randomly building BRR replicates (in a 2-per
design) so it is like an inefficient BRR and the number of bootstrap
replicates needed may depend on both the statistic being estimated and the
number of replicates in a fully balanced BRR set.
From: Fred Rohde [mailto:frohde_home at yahoo.com]
Sent: Wednesday, April 14, 2004 9:46 AM
To: Thomas Lumley
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Complex sample variances
I think I've figured out a way to do a bootstrap variance estimate of a
quantile. I need to work out the code, but this is the algorithm (for a
stratified cluster sample):
Make a list of the stratum values for the sample
For each stratum value,
Make a list of the PSU values within that stratum
Sample n-1 PSU values with replacement
Get the frequency of PSU values selected
Attach the frequency to the sample elements within the stratum by PSU
Construct a new weight within the stratum as the sample weight
multiplied by the frequency
Once the new weight is generated in all stratum, get the quantile
estimate(s) from svyquantile using the new weight
Repeat another 99 times to build 100 bootstrap replicates
Get the standard deviation of the replicate estimates as the variance
What do you think? It's kind of general. For stratified non-clustered
samples, the selections would be done on sample elements, not on PSUs, and
for non-stratified cluster cluster designs, the PSU selections would be done
across the whole sample, not by stratum.
I'm not that up with bootstrapping however. I'm not sure how to set/save
the seed values so running the procedure again on the same dataset will
produce the same variance.
Thomas Lumley <tlumley at u.washington.edu> wrote:
On Mon, 12 Apr 2004, Fred Rohde wrote:
> Thanks. I'll update the survey package. Sudaan does the standard
> errors on quantiles using Taylor series. If I can hunt down the formula
> it uses, could you add that to svyquantile?
If I can bring myself to believe it. Computing standard errors for the
normal approximation to the median is not easy even in simple random
> Thomas Lumley wrote:
> On Mon, 12 Apr 2004, Fred Rohde wrote:
> > Hello,
> > Is there a way to get complex sample variances in the survey package on
> > summary statistics other than means? If not, can they be added to a
> > future version? It would be be great to have them on totals, quantiles,
> > ratios, and tables (eg row percent, columns percent, etc).
> svytotal() and svyratio() will do this for totals and ratios if you have a
> new enough version. At the moment the easiest way to get row or column
> percentages is to think of them them as ratios of means of binary
> variables and use svyratio().
> Quantiles are more difficult, since neither Taylor series nor jackknife
> approaches work.
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
[[alternative HTML version deleted]]
R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide!
More information about the R-help