[R] Re: group definition for a bootstrap

Wed Jul 28 14:25:54 CEST 2004

Hi,
There dose not seem to be any other replies so far, may be because the question
is not quite clear. Do you want a separate estimation of the 'precision of the mean'
for each age by quarter subsample? This reply is based on that interpretation.
If not, you may need some sort of stratified bootstrap (see the article by Canty 
mentioned in the code below).
Also, since you want to use the bootstrap for the mean, are the (age by quarter)
subsample sizes small, or the subsamples heavily skewed?
In any case it is better to use the package boot by Canty and Ripley than programming 
from scratch. A good introduction to this package is written by Angelo J. Canty in R 
Newsletter 2/3, December 2002. 
The following code first generates an aretificial (large) data frame dd with the same structure
as your fram d. Then it performs bootstrap calculations in each age by quarter subsample.
The output includes for each subsample bootstrap based bias and standard error for the mean,
plots for checking discreteness problems of the bootstrap and normal and BCa confidence intervals.

To use this code for your sample it should suffice to set 

dd <- d

and start at the line
# prepares bootstrap procedure for the mean

in the code. The code has been checked for the large artificial sample generated.
It will probably break down if some subsamples are empty or nearly so.
And there may be discreteness problems if some subsamples are small, although the
bootstrap is fairly stable for means.

Best,
Tore Wentzel-Larsen

# 'Group definition for a bootstrap'

# generates artificial data frame dd to use (age and quarter as factors in case they are in your data):
nn <- floor(runif(1,1000,2000)) # random sample size
dd <- data.frame(factor(floor(runif(nn,0,6))), factor(floor(runif(nn,1,5))), rnorm(nn,15,3))
names(dd) <- c('age','quarter','x')

# prepares bootstrap procedure for the mean
#	(cf the article by Angelo J. Canty in R Newsletter 2/3, December 2002):

library(boot)
mean.w <- function(xx,ww) sum(xx*ww)

# means, with bootstrap estimates of bias and standard error,
#	for each age by quarter subsample:

nboot <- 1000 # number of bootstrap replications

par(ask = TRUE) # prompts before next plot
for (age in 0:5)
{
for (quarter in 1:4)
{
dd.sub <- dd[dd$age==age&dd$quarter==quarter,] # the subsample
boot.subsample <- boot(data=dd.sub$x, statistic=mean.w, R=nboot, stype='w')
print(boot.subsample)
print(c('age:', age, 'quarter:', quarter)) # for keeping track of the subsamples
plot(boot.subsample,) # checks for discreteness problems
# bootstrap confidence intervals (normal based and BCa):
ci.subsample <- boot.ci(boot.subsample, type=c('norm','bca'), conf=c(.95))
print(ci.subsample)
} 
}

par(ask = FALSE) # back to default setting for ask

> Hi,
> This is probably really simple, but I am clearly not R-minded, I have read
> the help files, and reread them, and I still can't work out what to do...
> I have a data frame (d) with 3 columns (age (0-5), quarter (1-4) and x).
> I want to estimate the precision of my mean x by age and quarter, so I want
> to carry out a bootstrap for each group.
> I am trying to do this within a loop, so I don't have to retype the whole
> thing out 20 times ...
> This is what I have done:
> 
> n = length (d$x)
> N = 1000
> stat = numeric(N)
> for (i in 1:N) {
> d$x2 = sample (d$x, n, replace=T)
> stat[i] = mean(d$x2)
> }
> 
> I believe I should define the age and quarter groups in the line straight
> after "for" - using the split function, or should I use some variant of
> [d$age == "1" & d$quarter =="1"] in the sample definition?
> Please don't think I am looking for an easy answer - I have been puzzling
> for this for over a week already :(
> (I am using R1.9 under MS W2000)
> Thank you in advance
> Louize