[R] bootstrap syntax + bootstrap for teaching

Tim Hesterberg timh at insightful.com
Wed Jan 31 20:00:29 CET 2007


Previous subject:
bootstrap bca confidence intervals for large number of statistics in one model; library("boot")

Jacob Wegelin asked for an easier way to do many bootstrap confidence
intervals for regression output.
The syntax would be easier with S+Resample, example below.
You create an ordinary lm object, then do e.g.
	boot <- bootstrap(fit, c(coef(fit), predict(fit, newdata=NEWDATA)))
	limits.bca(boot)


---- RESAMPLING FOR TEACHING ----

I would encourage people to consider using S+Resample for teaching.
There is a free student version of S+, and S+Resample is easier for
students -- easier syntax (in general, not just this example), plus a
menu interface.  There are free chapters and packages you can use to
supplement a statistics course with resampling.  See:
http://www.insightful.com/Hesterberg/bootstrap/#Reference.introStat

I'll give a workshop on resampling for teaching at the USCOTS conference
(U.S. Conference On Teaching Statistics) on May 16.
http://www.causeweb.org/uscots
http://www.causeweb.org/workshop/hesterberg/


set.seed(0)
x <- runif(150)
y <- 2/3 + pi * x^2 + runif(length(x))/2
plot(x,y)
DAT <- data.frame(x,y)
NEWDATA <- data.frame(x=seq(min(x), max(x), length=50))
library(resample)
fit <- lm(y ~ x + x^2, data=DAT)
boot <- bootstrap(fit, c(coef(fit), predict(fit, newdata = NEWDATA)))
CIs <- limits.bca(boot)
lines(NEWDATA$x, CIs[4:53,1], col="red")
lines(NEWDATA$x, CIs[4:53,4], col="red")
CIs[1:3,]

I used x + x^2 in the model rather than poly(x,2), because poly is
data dependent, so regression coefficients cannot be used for new
data, and confidence intervals for the coefficients are not meaningful.

Comment - the default is 1000 bootstrap samples; that isn't really
enough for BCa intervals, because the BCa calculations magnify the
Monte Carlo standard error by roughly a factor of two.

Jacob Wegelin wrote:
>Sometimes one might like to obtain pointwise bootstrap bias-corrected,
>accelerated (BCA) confidence intervals for a large number of statistics
>computed from a single dataset.  For instance, one might like to get
>(so as to plot graphically) bootstrap confidence bands for the fitted
>values in a regression model.
>
>(Example: Chiu S et al., Early Acceleration of Head Circumference in
>Children with Fragile X Syndrome and Autism. Journal of Developmental &
>Behavioral Pediatrics 2007;In press. In this paper we plot the
>estimated trajectories of head circumference for two different
>groups, computed by linear mixed-effects models, with confidence bands
>obtained by bootstrap.)
>
>Below is a toy example of how to do this using the "boot" library.
>We obtain BCA CIs for all three regression parameters and for the fitted
>values at 50 levels of the predictor.
>
>set.seed(1234567)
>x<-runif(150)
>y<-2/3 + pi * x^2 + runif(length(x))/2
>plot(x,y)
>DAT<-data.frame(x,y)
>NEWDATA<-data.frame(x=seq(min(x), max(x), length=50))
>library('boot')
>myfn<-function(data, whichrows) {
>	TheseData<-data[whichrows,]
>	thisLM<-lm( y~poly(x,2), data=TheseData)
>	thisFit<-predict(thisLM, newdata=NEWDATA)
>	c(
>		coef(summary(thisLM))[,"Estimate"]
>		, thisFit)
>}
>bootObj<-boot( data=DAT, statistic=myfn, R=1000 )
>names(bootObj)
>dim(bootObj$t)
>sofar<-t(sapply( 1:ncol(bootObj$t), function(thiscolumn) boot.ci(bootObj, type='bca', index=thiscolumn)$bca[4:5] ))
>colnames(sofar)<-c("lo", "hi")
>NEWDATA<-cbind(NEWDATA, sofar[4:nrow(sofar),])
>thecoefs<-sofar[1:3,]
>lines( NEWDATA$x, NEWDATA$lo, col='red')
>lines( NEWDATA$x, NEWDATA$hi, col='red')
>
>Note: To get boot.ci to run with type='bca' it seems necessary to have a
>large value of R.  Otherwise boot.ci returns an error, in my (limited)
>experience.
>
>Thanks in advance for any critiques.  (For instance, is there an easier or more elegant way?)

Caveat - I helped create S+Resample.

========================================================
| Tim Hesterberg       Senior Research Scientist       |
| timh at insightful.com  Insightful Corp.                |
| (206)802-2319        1700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|                      www.insightful.com/Hesterberg   |
========================================================
Download S+Resample from www.insightful.com/downloads/libraries

Bootstrap short courses:  May 21-22 Philadelphia, Oct 10-11 San Francisco.
                          2-3 May UK, 3-4 Oct UK.
		http://www.insightful.com/services/training.asp

Workshop on resampling for teaching:  May 16 Ohio State 
		http://www.causeweb.org/workshop/hesterberg/



More information about the R-help mailing list