[R] p-values from bootstrap - what am I not understanding?

Mon Apr 13 02:22:47 CEST 2009

There is really nothing wrong with this approach, which differs 
primarily from the permutation test in that sampling is with 
replacement instead of without replacement (multinomial vs. multiple 
hypergeometric).

One of the issues that permutation tests don't have is bias in the statistic.

In order for bootstrap p-values to be reasonably accurate, you need a 
reasonable dataset size, so that sampling with replacement isn't a 
big effect, and so that enough patterns arise in resampling. It also 
helps if the data is continuous instead of categorical or binary.

The same issues affect permutation tests, but untroubled by bias.

The usual methods for p-values (e.g., see Fisher's test in Agresti's 
Categorical Analysis) work here. Typically there is some ambiguity on 
how to treat the values equal to the observed statistic. If you 
include it, the p-value is conservative for rejection. If you don't, 
it's liberal for rejection. If you include 1/2 weight, it averages 
correctly in the long run.

Ditto for 2-tailed p-values vs. single tails. Several different 
methods (some of which you listed) are used.

As a general rule, if you have data from which you wish a p-value, a 
permutation (i.e., without replacement) test is used, but for 
confidence intervals, bootstrapping (i.e., with replacement) is used.

For reasonably large datasets, both methods will agree closely. But 
permutation tests are typically used for smaller size datasets. 
(Think binomial vs. hypgeometric distributions for p-values, and when 
they agree.)

At 05:47 PM 4/12/2009, Johan Jackson wrote:
>Dear stats experts:
>Me and my little brain must be missing something regarding bootstrapping. I
>understand how to get a 95%CI and how to hypothesis test using bootstrapping
>(e.g., reject or not the null). However, I'd also like to get a p-value from
>it, and to me this seems simple, but it seems no-one does what I would like
>to do to get a p-value, which suggests I'm not understanding something.
>Rather, it seems that when people want a p-value using resampling methods,
>they immediately jump to permutation testing (e.g., destroying dependencies
>so as to create a null distribution). SO - here's my thought on getting a
>p-value by bootstrapping. Could someone tell me what is wrong with my
>approach? Thanks:
>
>STEPS TO GETTING P-VALUES FROM BOOTSTRAPPING - PROBABLY WRONG:
>
>1) sample B times with replacement, figure out theta* (your statistic of
>interest). B is large (> 1000)
>
>2) get the distribution of theta*
>
>3) the mean of theta* is generally near your observed theta. In the same way
>that we use non-centrality parameters in other situations, move the
>distribution of theta* such that the distribution is centered around the
>value corresponding to your null hypothesis (e.g., make the distribution
>have a mean theta = 0)
>
>4) Two methods for finding 2-tailed p-values (assuming here that your
>observed theta is above the null value):
>Method 1: find the percent of recentered theta*'s that are above your
>observed theta. p-value = 2 * this percent
>Method 2: find the percent of recentered theta*'s that are above the
>absolute value of your observed value. This is your p-value.
>
>So this seems simple. But I can't find people discussing this. So I'm
>thinking I'm wrong. Could someone explain where I've gone wrong?
>
>
>J Jackson
>
>         [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"