[R] When to use bootstrap confidence intervals?

Mark Seeto markseeto at gmail.com
Mon Aug 16 13:09:25 CEST 2010


Hello, I have a question regarding bootstrap confidence intervals.
Suppose we have a data set consisting of single measurements, and that
the measurements are independent but the distribution is unknown. If
we want a confidence interval for the population mean, when should a
bootstrap confidence interval be preferred over the elementary t
interval?

I was hoping the answer would be "always", but some simple simulations
suggest that this is incorrect. I simulated some data and calculated
95% elementary t intervals and 95% bootstrap BCA intervals (with the
boot package). I calculated the proportion of confidence intervals
lying entirely above the true mean, the proportion entirely below the
true mean, and the proportion containing the true mean. I used a
normal distribution and a t distribution with 3 df.

library(boot)
samplemean <- function(x, ind) mean(x[ind])

ci.norm <- function(sample.size, n.samples, mu=0, sigma=1, boot.reps) {
   t.under <- 0; t.over <- 0
   bca.under <- 0; bca.over <- 0
   for (k in 1:n.samples) {
     x <- rnorm(sample.size, mu, sigma)
     b <- boot(x, samplemean, R = boot.reps)
     bci <- boot.ci(b, type="bca")
     if (mu < mean(x) - qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))
       t.under <- t.under + 1
     if (mu > mean(x) + qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))
       t.over <- t.over + 1
     if (mu < bci$bca[4]) bca.under <- bca.under + 1
     if (mu > bci$bca[5]) bca.over <- bca.over + 1
   }
   return(list(t = c(t.under, t.over, n.samples - (t.under + t.over))/n.samples,
          bca = c(bca.under, bca.over, n.samples - (bca.under +
bca.over))/n.samples))
}

ci.t <- function(sample.size, n.samples, df, boot.reps) {
   t.under <- 0; t.over <- 0
   bca.under <- 0; bca.over <- 0
   for (k in 1:n.samples) {
     x <- rt(sample.size, df)
     b <- boot(x, samplemean, R = boot.reps)
     bci <- boot.ci(b, type="bca")
     if (0 < mean(x) - qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))
       t.under <- t.under + 1
     if (0 > mean(x) + qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))
       t.over <- t.over + 1
     if (0 < bci$bca[4]) bca.under <- bca.under + 1
     if (0 > bci$bca[5]) bca.over <- bca.over + 1
   }
   return(list(t = c(t.under, t.over, n.samples - (t.under + t.over))/n.samples,
          bca = c(bca.under, bca.over, n.samples - (bca.under +
bca.over))/n.samples))
}

set.seed(1)
ci.norm(sample.size = 10, n.samples = 1000, boot.reps = 1000)
$t
[1] 0.019 0.026 0.955

$bca
[1] 0.049 0.059 0.892

ci.norm(sample.size = 20, n.samples = 1000, boot.reps = 1000)
$t
[1] 0.030 0.024 0.946

$bca
[1] 0.035 0.037 0.928

ci.t(sample.size = 10, n.samples = 1000, df = 3, boot.reps = 1000)
$t
[1] 0.018 0.022 0.960

$bca
[1] 0.055 0.076 0.869

Warning message:
In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints

ci.t(sample.size = 20, n.samples = 1000, df = 3, boot.reps = 1000)
$t
[1] 0.027 0.014 0.959

$bca
[1] 0.054 0.047 0.899

I don't understand the warning message, but for these examples, the
ordinary t interval appears to be better than the bootstrap BCA
interval. I would really appreciate any recommendations anyone can
give on when bootstrap confidence intervals should be used.

Thanks,
Mark
--
Mark Seeto
National Acoustic Laboratories, Australian Hearing



More information about the R-help mailing list