[R] memory and bootstrapping

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu May 5 10:01:46 CEST 2011


The only reason the boot package will take more memory for 2000 
replications than 10 is that it needs to store the results.  That is 
not to say that on a 32-bit OS the fragmentation will not get worse, 
but that is unlikely to be a significant factor.

As for the methodology: 'boot' is support software for a book, so 
please consult it (and not secondary sources).  From your brief 
description it looks to me as if you should be using studentized CIs.

130,000 cases is a lot, and running the experiment on a 1% sample 
may well show that asymptotic CIs are good enough.

On Thu, 5 May 2011, E Hofstadler wrote:

> hello,
>
> the following questions will without doubt reveal some fundamental
> ignorance, but hopefully you can still help me out.
>
> I'd like to bootstrap a coefficient gained on the basis of the
> coefficients in a logistic regression model (the mean differences in
> the predicted probabilities between two groups, where each predict()
> operation uses as the newdata-argument a dataframe of equal size as
> the original dataframe).I've got 130,000 rows and 7 columns in my
> dataframe. The glm-model uses all variables (as well as two 2-way
> interactions).
>
> System:
> - R-version: 2.12.2
> - OS: Windows XP Pro, 32-bit
> - 3.16Ghz intel dual core processor, 2.9GB RAM
>
> I'm using the boot package to arrive at the standard errors for this
> difference, but even with only 10 replications, this takes quite a
> long time: 216 seconds (perhaps this is partly also due to my
> inefficiently programmed function underlying the boot-call, I'm also
> looking into that).
>
> I wanted to try out calculating a bca-bootstrapped confidence
> interval, which as I understand requires a lot more replications than
> normal-theory intervals. Drawing on John Fox' Appendix to his "An R
> Companion to Applied Regression", I was thinking of trying out 2000
> replications -- but this will take several hours to compute on my
> system (which isn't in itself a major issue though).
>
> My Questions:
> - let's say I try bootstrapping with 2000 replications. Can I be
> certain that the memory available to R  will be sufficient for this
> operation?
> - (this relates to statistics more generally): is it a good idea in
> your opinion to try bca-bootstrapping, or can it be assumed that a
> normal theory confidence interval will be a sufficiently good
> approximation (letting me get away with, say, 500 replications)?
>
>
> Best,
> Esther
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list